Posts Tagged ‘Weblogic 10’
java.net.ConnectException: (…) Bootstrap to (…) failed. It is likely that the remote side declared peer gone on this JVM
Case and Topology
RMI services are deployed on UAT, exposed via a F5, at the following address: t3://my-f5-frontal.my.domain.extension:7090
The actual servers are my-first-node.my.domain.extension
and my-second-node.my.domain.extension
.
The client application is deployed in a remote location, on a QA server.
The ports are open between QA and UAT, and we can ping and use telnet with no issue on QA.
Anyway, when I launch the client application from QA, I get the following error:
2011-10-31 06:41:03,277 INFO support.DefaultListableBeanFactory - Destroying singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@79e304: defining beans [jonathanServiceClient]; root of factory hierarchy Exception in thread "main" org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'smartServiceClient' defined in class path resource [com/lalou/jonathan/rmi-client-spring.xml]: Invocation of init method failed; nested exception is org.springframework.remoting.RemoteLookupFailureException: JNDI lookup for RMI service [rmiServices] failed; nested exception is javax.naming.CommunicationException [Root exception is java.net.ConnectException: t3://my-f5-frontal.my.domain.extension:7090: Bootstrap to my-f5-frontal.my.domain.extension/111.222.012.123:7090 failed. It is likely that the remote side declared peer gone on this JVM]
Explanation and Fix
Owing to my understanding, here is the point: when the client tries to connect to F5, it presents itself with its name, and the F5 returns the name of the actual server. If both client and server are not on the same domain (“domain” as network domain, no link with Weblogic domain), then the DNS resolution may fail.
To fix this issue, you have to follow one or both of the following points: everything depends on your local topology.
WebLogic: “Listen Address”
Modify the “Listen Address” in WebLogic administration console, from home: Servers > MyFirstNode/MySecondNode > Configuration > General > Listen Address > update it
By “update” the “Listen Address”, I mean providing the complete name of the machines, including the domain extension.
eg: my-first-node.my.domain.extension
and my-second-node.my.domain.extension
, rather than my-first-node
and my-second-node
(or, even worse, localhost
).
You can also provide an IP, cf. WebLogic documentation on Oracle’s website.
Of course, you can decide to set it directly in WebLogic’s config.xml
.
Caution! This option may also be set via the command line running WebLogic, using the flag -Dweblogic.ListenAddress=...
Therefore, take care to be consistent between the content of console/config.xml
and the command line option.
Hosts
On client side, check the content of hosts file. Usually, you can found it at /etc/hosts
(or C:\WINDOWS\system32\drivers\etc\hosts
on Windows XP).
Assuming your machine is myClientMachine
with an IP 123.123.123.123 and a domain extension remote.domain
, then your hosts file should look like:
127.0.0.1 localhost 123.123.123.123 myClientMachine
Update it to:
127.0.0.1 localhost 123.123.123.123 myClientMachine myClientMachine.remote.domain
Thread leaks in Mule ESB 2.2.1
Abstract
The application I work on packages Mule ESB 2.2.1 in a WAR and deploys it under a WebLogic 10.3 server. My team mates and I noticed that, on multiple deploy/undeploy cycles, the PermGen size dramatically decreased. The cause of this was the number of threads, which hardly decreased on undeployment phases, unlike the expected behaviour.
Indeed, Mule is seldom deployed as a WebApp. Rather, it is designed to be run as a standalone application, within a Tanuki wrapper. When the JVM is killed, all the threads are killed, too, and therefore no thread survives ; hence, the memory is freed and there is no reason to fear a thread leak.
Moreover, when the application is redeployed, new threads -with the same names as the “old” threads- are created. The risk is that, for any reason, a thread-name-based communication between threads may fail, because the communication pipe may be read by the wrong thread.
In my case: on WebLogic startup, there are 31 threads ; when the application is deployed, there are 150 ; when the application works (receives and handles messages), the number of threads climbs to 800 ; when the application is undeployed, only 12 threads are killed, the other remaining alive.
The question is: how to kill Mule-created threads, in order to avoid a Thread leak?
WebLogic Threads
I performed a thread dump at WebLogic startup. Here are WebLogic threads, created before any deployment occurs:
Attach Listener DoSManager DynamicListenThread[Default[1]] DynamicListenThread[Default] ExecuteThread: '0' for queue: 'weblogic.socket.Muxer' ExecuteThread: '1' for queue: 'weblogic.socket.Muxer' ExecuteThread: '2' for queue: 'weblogic.socket.Muxer' Finalizer JMX server connection timeout 42 RMI Scheduler(0) RMI TCP Accept-0 RMI TCP Connection(1)-127.0.0.1 RMI TCP Connection(2)-127.0.0.1 Reference Handler Signal Dispatcher Thread-10 Thread-11 Timer-0 Timer-1 VDE Transaction Processor Thread [ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' [ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)' [STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)' [STANDBY] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)' [STANDBY] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)' [STANDBY] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)' main weblogic.GCMonitor weblogic.cluster.MessageReceiver weblogic.time.TimeEventGenerator weblogic.timers.TimerThread
Dispose Disposables, Stop Stoppables…
The application being deployed in a WAR, I created a servlet implementing ServletContextListener
. In the method contextDestroyed()
, I destroy Mule objects (Disposable, Stoppable, Model, Service, etc.) one per one.
Eg#1:
final Collection<Model> allModels; try { allModels = MuleServer.getMuleContext().getRegistry().lookupObjects(Model.class); if (LOGGER.isDebugEnabled()) { LOGGER.debug("Disposing models " + allModels.size()); } for (Model model : allModels) { model.dispose(); } allModels.clear(); } catch (Exception e) { LOGGER.error(e); }
Eg#2:
private void stopStoppables() { final Collection<Stoppable> allStoppables; try { allStoppables = MuleServer.getMuleContext().getRegistry().lookupObjects(Stoppable.class); if (LOGGER.isDebugEnabled()) { LOGGER.debug("Stopping stoppables " + allStoppables.size()); } for (Stoppable stoppable : allStoppables) { stoppable.stop(); } allStoppables.clear(); } catch (MuleException e) { LOGGER.error(e); } }
This first step is needed because default mechanism is flawed: Mule re-creates objects that were destroyed.
Kill Threads
The general idea to kill Mule threads is the following: perform a Unix-style “diff” between WebLogic native threads, and the threads still alive once all Mule objects have been stopped and disposed.
On Application Startup
In the ServletContextListener
, I add a field that will be set in a method called in the constructor:
private List<String> threadsAtStartup; (...) /** * This method retrieves the Threads present at startup: mainly speaking, they are Threads related to WebLogic. */ private void retrieveThreadsOnStartup() { final Thread[] threads; final ThreadGroup threadGroup; threadGroup = Thread.currentThread().getThreadGroup(); try { threads = retrieveCurrentActiveThreads(threadGroup); } catch (NoSuchFieldException e) { LOGGER.error("Could not retrieve initial Threads list. The application may be unstable on shutting down ", e); threadsAtStartup = new ArrayList<String>(); return; } catch (IllegalAccessException e) { LOGGER.error("Could not retrieve initial Threads list. The application may be unstable on shutting down ", e); threadsAtStartup = new ArrayList<String>(); return; } threadsAtStartup = new ArrayList<String>(threads.length); for (int i = 0; i < threads.length; i++) { final Thread thread; try { thread = threads[i]; if (null != thread) { threadsAtStartup.add(thread.getName()); if (LOGGER.isDebugEnabled()) { LOGGER.debug("This Thread was available at startup: " + thread.getName()); } } } catch (RuntimeException e) { LOGGER.error("An error occured on initial Thread statement: ", e); } } } /** * Hack to retrieve the field ThreadGroup.threads, which is package-protected and therefore not accessible * * @param threadGroup * @return * @throws NoSuchFieldException * @throws IllegalAccessException */ private Thread[] retrieveCurrentActiveThreads(ThreadGroup threadGroup) throws NoSuchFieldException, IllegalAccessException { final Thread[] threads; final Field privateThreadsField; privateThreadsField = ThreadGroup.class.getDeclaredField("threads"); privateThreadsField.setAccessible(true); threads = (Thread[]) privateThreadsField.get(threadGroup); return threads; }
On application shutdown
In the method ServletContextListener.contextDestroyed()
, let’s call this method:
/** * Cleanses the Threads on shutdown: theorically, when the WebApp is undeployed, should remain only the threads * that were present before the WAR was deployed. Unfornately, Mule leaves alive many threads on shutdown, reducing * PermGen size and recreating new threads with the same names as the old ones, inducing a kind of instability. */ private void cleanseThreadsOnShutdown() { final Thread[] threads; final ThreadGroup threadGroup; final String currentThreadName; currentThreadName = Thread.currentThread().getName(); if (LOGGER.isDebugEnabled()) { LOGGER.debug("On shutdown, currentThreadName is: " + currentThreadName); } threadGroup = Thread.currentThread().getThreadGroup(); try { threads = retrieveCurrentActiveThreads(threadGroup); } catch (NoSuchFieldException e) { LOGGER.error("An error occured on Threads cleaning at shutdown", e); return; } catch (IllegalAccessException e) { LOGGER.error("An error occured on Threads cleaning at shutdown", e); return; } for (Thread thread : threads) { final String threadName = thread.getName(); final Boolean shouldThisThreadBeKilled; shouldThisThreadBeKilled = isThisThreadToBeKilled(currentThreadName, threadName); if (LOGGER.isDebugEnabled()) { LOGGER.info("should the thread named " + threadName + " be killed? " + shouldThisThreadBeKilled); } if (shouldThisThreadBeKilled) { thread.interrupt(); thread = null; } } } /** * Says whether a thread is to be killed<br/> * Rules: * <ul><li>a Thread must NOT be killed if:</li> * <ol> * <li>it was among the threads available at startup</li> * <li>it is a Thread belonging to WebLogic (normally, WebLogic threads are among the list in the previous case</li> * <li>it is the current Thread (simple protection against unlikely situation)</li> * </ol> * <li>a Thread must be killed: in all other cases</li> * </ul> * * @param currentThreadName * @param threadName * @return */ private Boolean isThisThreadToBeKilled(String currentThreadName, String threadName) { final Boolean toBeKilled; toBeKilled = !threadsAtStartup.contains(threadName) && !StringUtils.contains(threadName, "weblogic") && !threadName.equalsIgnoreCase(currentThreadName); return toBeKilled; }
EhCache
My application uses an EhCache. Its threads names usually end with “.data”. They are not killed by the previous actions. To get rid of them, the most elegant way is to add this block in the web.xml
:
<listener> <listener-class>net.sf.ehcache.constructs.web.ShutdownListener</listener-class> </listener>
With all these operations, almost all threads are killed. But Java VisualVM still displays 34, vs. 31 at startup.
Tough Threads
A thread dump confirms that, at this point, 3 rebellious threads still refuse to be kill:
MuleServer.1 SocketTimeoutMonitor-Monitor.1 SocketTimeoutMonitor-Monitor.1
Let’s examine them:
MuleServer.1
: This thread is an instance of the inner classMuleServer.ShutdownThread
. Indeed, this is the first thread created by Mule, and therefore appears among the threads available at startup, before theServletContextListener
is called… I did not succeed in killing it, even why trying to kill it namely, which makes sense: killing the father thread looks like suiciding theServletContextListener
.SocketTimeoutMonitor-Monitor.1
: This thread is created by Mule’sTcpConnector
and its daughter classes:HttpConnector
,SslConnector
, etc. Again, I could not kill them.
Conclusion
We have seen Mule suffers of major thread leaks when deployed as a WAR. Anyway, most of these leaks may be sealed.
I assume MuleSoft was aware of this issue: in the version 3 of Mule, the deployment of webapps was refactored.
Failed to start Service “Cluster” (ServiceState=SERVICE_STOPPED, STATE_ANNOUNCE)
Case
I introduced an Oracle Coherence cache withing my application, which is deployed as a WAR within WebLogic Server. In a first step, I used an instance of Oracle Coherence / Coherence Web already built in WebLogic. Then, for a couple a reasons, I detroyed the Coherence cluster. Deploying the application, the following error appeared:
java.lang.RuntimeException: Failed to start Service "Cluster" (ServiceState=SERVICE_STOPPED, STATE_ANNOUNCE)
Complete Stacktrace
java.lang.RuntimeException: Failed to start Service "Cluster" (ServiceState=SERVICE_STOPPED, STATE_ANNOUNCE) at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:38) at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:38) at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:366) at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11) at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:3)
Explanation and Fix
The Coherence cluster view available in WebLogic server is a view of the Tangosol configuration, available in the files tangosol-coherence*.xml
of Coherence’s JAR. To fix the issue, create a file tangosol-coherence-override.xml
, in your classpath. Fill the file with a minimum content, such as:
<?xml version='1.0'?> <!DOCTYPE coherence SYSTEM "coherence.dtd"> <coherence> <cluster-config> <multicast-listener> <time-to-live system-property="tangosol.coherence.ttl">0</time-to-live> <join-timeout-milliseconds>3000</join-timeout-milliseconds> </multicast-listener> <unicast-listener> <address>127.0.0.1</address> <port>8088</port> <port-auto-adjust>true</port-auto-adjust> <priority>8</priority> </unicast-listener> <packet-publisher> <packet-delivery> <timeout-milliseconds>30000</timeout-milliseconds> </packet-delivery> </packet-publisher> <service-guardian> <timeout-milliseconds system-property="tangosol.coherence.guard.timeout">35000</timeout-milliseconds> </service-guardian> </cluster-config> <logging-config> <severity-level system-property="tangosol.coherence.log.level">5</severity-level> <character-limit system-property="tangosol.coherence.log.limit">0</character-limit> </logging-config> </coherence>
Of course, you can amend and improve the file to match your requirements.
NB: in case your file is ignored by Coherence, override the property tangosol.coherence.override
with value tangosol-coherence-override.xml
.
WebLogic: set a System property within a WAR
Case
You would like to set a System property within an application packaged as a WAR.
Of course, you may modify the launching scripts of your servers, to add an option -DmyPropertyName=myPropertyValue
. Sometimes, you would like to avoid such a solution, because updating the property would require an update of the setEnv.*
files and therefore a action of the exploitation team.
In my case, I had to set the property tangosol.coherence.cacheconfig
, which hints at the configuration file used my Oracle Coherence / Coherence*Web
Fix
The first solution is to set the property in a startup class. For more detials, consult this page: WebLogic: use a startup and/or a shutdown class in a WAR.
Another mean to handle this problematic is to create a servlet, with a code similar to:
public class JonathanBootServlet extends HttpServlet { private static final Logger LOGGER = Logger.getLogger(JonathanBootServlet.class); private static final String SVN_ID = "$Id$"; public JonathanBootServlet() { super(); if (LOGGER.isDebugEnabled()){ LOGGER.debug(SVN_ID); } } public void init(ServletConfig config) throws ServletException { if (LOGGER.isDebugEnabled()){ LOGGER.debug("in init()"); } System.setProperty("tangosol.coherence.cacheconfig", "jonathan-tangosol-coherence.xml"); super.init(config); } }
Then add the following block in your web.xml:
<servlet> <servlet-name>JonathanBootServlet</servlet-name> <servlet-class>lalou.jonathan.weblogic.technical.JonathanBootServlet</servlet-class> <load-on-startup>1</load-on-startup> </servlet>
Ensure this servlet is run before all others (1
).
java.lang.NoClassDefFoundError: com/tangosol/run/component/EventDeathException
Case
You have an application which uses Oracle Coherence / Coherence*Web as cache manager. On redeploying the application, you get a stacktrace of which header is:
Exception in thread 'DistributedCache|SERVICE_STOPPED' java.lang.NoClassDefFoundError: com/tangosol/run/component/EventDeathException
Explanation
Indeed, an undeploy operation stops the services of your application. But Coherence is not launched as a service among the others: it should be considered as an independant one. Therefore, you have to stop it before undeploying your application ; otherwise, your classloader is in a confused state, and loses references to classes it had loaded before, but that should now be unloaded.
Fix
You have to stop Oracle Coherence on your application shutdown.
EAR
If your application is packaged as an EAR, then create a shutdown class with the following main():
public class JonathanLalouEARShutdown extends ApplicationLifecycleListener { public static void main(String[] args) { CacheFactory.shutdown(); } }
Add the following block in your weblogic-application.xml
, hinting the shutdown class:
<shutdown> <shutdown-class>lalou.jonathan.weblogic.JonathanLalouEARShutdown </shutdown-class> </shutdown>
WAR
If your application is packaged as a WAR, then create a class implementing ServletContextListener
, and add the following block in your web.xml
:
<listener> <listener-class>your.class.implementing.ServletContextListener</listener-class> </listener>
The detailed procedure is described at this link: WebLogic: use a startup and/or a shutdown class in a WAR