Search
Calendar
March 2024
S M T W T F S
« Sep    
 12
3456789
10111213141516
17181920212223
24252627282930
31  
Your widget title
Archives

Posts Tagged ‘Thread’

PostHeaderIcon Thread leaks in Mule ESB 2.2.1

Abstract

The application I work on packages Mule ESB 2.2.1 in a WAR and deploys it under a WebLogic 10.3 server. My team mates and I noticed that, on multiple deploy/undeploy cycles, the PermGen size dramatically decreased. The cause of this was the number of threads, which hardly decreased on undeployment phases, unlike the expected behaviour.
Indeed, Mule is seldom deployed as a WebApp. Rather, it is designed to be run as a standalone application, within a Tanuki wrapper. When the JVM is killed, all the threads are killed, too, and therefore no thread survives ; hence, the memory is freed and there is no reason to fear a thread leak.

Moreover, when the application is redeployed, new threads -with the same names as the “old” threads- are created. The risk is that, for any reason, a thread-name-based communication between threads may fail, because the communication pipe may be read by the wrong thread.

In my case: on WebLogic startup, there are 31 threads ; when the application is deployed, there are 150 ; when the application works (receives and handles messages), the number of threads climbs to 800 ; when the application is undeployed, only 12 threads are killed, the other remaining alive.

The question is: how to kill Mule-created threads, in order to avoid a Thread leak?

WebLogic Threads

I performed a thread dump at WebLogic startup. Here are WebLogic threads, created before any deployment occurs:

Attach Listener
DoSManager
DynamicListenThread[Default[1]]
DynamicListenThread[Default]
ExecuteThread: '0' for queue: 'weblogic.socket.Muxer'
ExecuteThread: '1' for queue: 'weblogic.socket.Muxer'
ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'
Finalizer
JMX server connection timeout 42
RMI Scheduler(0)
RMI TCP Accept-0
RMI TCP Connection(1)-127.0.0.1
RMI TCP Connection(2)-127.0.0.1
Reference Handler
Signal Dispatcher
Thread-10
Thread-11
Timer-0
Timer-1
VDE Transaction Processor Thread
[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'
[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'
[STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'
[STANDBY] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'
[STANDBY] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'
[STANDBY] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)'
main
weblogic.GCMonitor
weblogic.cluster.MessageReceiver
weblogic.time.TimeEventGenerator
weblogic.timers.TimerThread

Dispose Disposables, Stop Stoppables…

The application being deployed in a WAR, I created a servlet implementing ServletContextListener. In the method contextDestroyed(), I destroy Mule objects (Disposable, Stoppable, Model, Service, etc.) one per one.
Eg#1:

        final Collection<Model> allModels;
        try {
            allModels = MuleServer.getMuleContext().getRegistry().lookupObjects(Model.class);
            if (LOGGER.isDebugEnabled()) {
                LOGGER.debug("Disposing models " + allModels.size());
            }
            for (Model model : allModels) {
                model.dispose();
            }
            allModels.clear();
        } catch (Exception e) {
            LOGGER.error(e);
        }

Eg#2:

    private void stopStoppables() {
        final Collection<Stoppable> allStoppables;
        try {
            allStoppables = MuleServer.getMuleContext().getRegistry().lookupObjects(Stoppable.class);
            if (LOGGER.isDebugEnabled()) {
                LOGGER.debug("Stopping stoppables " + allStoppables.size());
            }
            for (Stoppable stoppable : allStoppables) {
                stoppable.stop();
            }
            allStoppables.clear();
        } catch (MuleException e) {
            LOGGER.error(e);
        }
    }

This first step is needed because default mechanism is flawed: Mule re-creates objects that were destroyed.

Kill Threads

The general idea to kill Mule threads is the following: perform a Unix-style “diff” between WebLogic native threads, and the threads still alive once all Mule objects have been stopped and disposed.

On Application Startup

In the ServletContextListener, I add a field that will be set in a method called in the constructor:

    private List<String> threadsAtStartup;
(...)
/**
     * This method retrieves the Threads present at startup: mainly speaking, they are Threads related to WebLogic.
     */
    private void retrieveThreadsOnStartup() {
        final Thread[] threads;
        final ThreadGroup threadGroup;
        threadGroup = Thread.currentThread().getThreadGroup();
        try {
            threads = retrieveCurrentActiveThreads(threadGroup);
        } catch (NoSuchFieldException e) {
            LOGGER.error("Could not retrieve initial Threads list. The application may be unstable on shutting down ", e);
            threadsAtStartup = new ArrayList<String>();
            return;
        } catch (IllegalAccessException e) {
            LOGGER.error("Could not retrieve initial Threads list. The application may be unstable on shutting down ", e);
            threadsAtStartup = new ArrayList<String>();
            return;
        }

        threadsAtStartup = new ArrayList<String>(threads.length);
        for (int i = 0; i < threads.length; i++) {
            final Thread thread;
            try {
                thread = threads[i];
                if (null != thread) {
                    threadsAtStartup.add(thread.getName());
                    if (LOGGER.isDebugEnabled()) {
                        LOGGER.debug("This Thread was available at startup: " + thread.getName());
                    }
                }
            } catch (RuntimeException e) {
                LOGGER.error("An error occured on initial Thread statement: ", e);
            }
        }
    }
    /**
     * Hack to retrieve the field ThreadGroup.threads, which is package-protected and therefore not accessible 
     *
     * @param threadGroup
     * @return
     * @throws NoSuchFieldException
     * @throws IllegalAccessException
     */
    private Thread[] retrieveCurrentActiveThreads(ThreadGroup threadGroup) throws NoSuchFieldException, IllegalAccessException {
        final Thread[] threads;
        final Field privateThreadsField;
        privateThreadsField = ThreadGroup.class.getDeclaredField("threads");
        privateThreadsField.setAccessible(true);

        threads = (Thread[]) privateThreadsField.get(threadGroup);
        return threads;
    }

On application shutdown

In the method ServletContextListener.contextDestroyed(), let’s call this method:

    /**
     * Cleanses the Threads on shutdown: theorically, when the WebApp is undeployed, should remain only the threads
     * that were present before the WAR was deployed. Unfornately, Mule leaves alive many threads on shutdown, reducing
     * PermGen size and recreating new threads with the same names as the old ones, inducing a kind of instability.
     */
    private void cleanseThreadsOnShutdown() {
        final Thread[] threads;
        final ThreadGroup threadGroup;
        final String currentThreadName;

        currentThreadName = Thread.currentThread().getName();

        if (LOGGER.isDebugEnabled()) {
            LOGGER.debug("On shutdown, currentThreadName is: " + currentThreadName);
        }

        threadGroup = Thread.currentThread().getThreadGroup();
        try {
            threads = retrieveCurrentActiveThreads(threadGroup);
        } catch (NoSuchFieldException e) {
            LOGGER.error("An error occured on Threads cleaning at shutdown", e);
            return;
        } catch (IllegalAccessException e) {
            LOGGER.error("An error occured on Threads cleaning at shutdown", e);
            return;
        }

        for (Thread thread : threads) {
            final String threadName = thread.getName();
            final Boolean shouldThisThreadBeKilled;

            shouldThisThreadBeKilled = isThisThreadToBeKilled(currentThreadName, threadName);
            if (LOGGER.isDebugEnabled()) {
                LOGGER.info("should the thread named " + threadName + " be killed? " + shouldThisThreadBeKilled);
            }
            if (shouldThisThreadBeKilled) {
                thread.interrupt();
                thread = null;
            }
        }

    }

    /**
     * Says whether a thread is to be killed<br/>
     * Rules:
     * <ul><li>a Thread must NOT be killed if:</li>
     * <ol>
     * <li>it was among the threads available at startup</li>
     * <li>it is a Thread belonging to WebLogic (normally, WebLogic threads are among the list in the previous case</li>
     * <li>it is the current Thread (simple protection against unlikely situation)</li>
     * </ol>
     * <li>a Thread must be killed: in all other cases</li>
     * </ul>
     *
     * @param currentThreadName
     * @param threadName
     * @return
     */
    private Boolean isThisThreadToBeKilled(String currentThreadName, String threadName) {
        final Boolean toBeKilled;
        toBeKilled = !threadsAtStartup.contains(threadName)
                &amp;&amp; !StringUtils.contains(threadName, "weblogic")
                &amp;&amp; !threadName.equalsIgnoreCase(currentThreadName);
        return toBeKilled;
    }

EhCache

My application uses an EhCache. Its threads names usually end with “.data”. They are not killed by the previous actions. To get rid of them, the most elegant way is to add this block in the web.xml:

     <listener>
          <listener-class>net.sf.ehcache.constructs.web.ShutdownListener</listener-class>
     </listener>

cf EhCache documentation

With all these operations, almost all threads are killed. But Java VisualVM still displays 34, vs. 31 at startup.

Tough Threads

A thread dump confirms that, at this point, 3 rebellious threads still refuse to be kill:

MuleServer.1
SocketTimeoutMonitor-Monitor.1
SocketTimeoutMonitor-Monitor.1

Let’s examine them:

  • MuleServer.1: This thread is an instance of the inner class MuleServer.ShutdownThread. Indeed, this is the first thread created by Mule, and therefore appears among the threads available at startup, before the ServletContextListener is called… I did not succeed in killing it, even why trying to kill it namely, which makes sense: killing the father thread looks like suiciding the ServletContextListener.
  • SocketTimeoutMonitor-Monitor.1: This thread is created by Mule’s TcpConnector and its daughter classes: HttpConnector, SslConnector, etc. Again, I could not kill them.

Conclusion

We have seen Mule suffers of major thread leaks when deployed as a WAR. Anyway, most of these leaks may be sealed.
I assume MuleSoft was aware of this issue: in the version 3 of Mule, the deployment of webapps was refactored.