Jonathan Lalou's blog

Posts Tagged ‘Weblogic 10’

java.net.ConnectException: (…) Bootstrap to (…) failed. It is likely that the remote side declared peer gone on this JVM

Case and Topology

RMI services are deployed on UAT, exposed via a F5, at the following address: t3://my-f5-frontal.my.domain.extension:7090
The actual servers are my-first-node.my.domain.extension and my-second-node.my.domain.extension.

The client application is deployed in a remote location, on a QA server.
The ports are open between QA and UAT, and we can ping and use telnet with no issue on QA.

Anyway, when I launch the client application from QA, I get the following error:

2011-10-31 06:41:03,277 INFO  support.DefaultListableBeanFactory  - Destroying singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@79e304: defining beans [jonathanServiceClient]; root of factory hierarchy
Exception in thread "main" org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'smartServiceClient' defined in class path resource [com/lalou/jonathan/rmi-client-spring.xml]: Invocation of init method failed; nested exception is org.springframework.remoting.RemoteLookupFailureException: JNDI lookup for RMI service [rmiServices] failed; nested exception is javax.naming.CommunicationException [Root exception is java.net.ConnectException: t3://my-f5-frontal.my.domain.extension:7090: Bootstrap to my-f5-frontal.my.domain.extension/111.222.012.123:7090 failed. It is likely that the remote side declared peer gone on this JVM]

Explanation and Fix

Owing to my understanding, here is the point: when the client tries to connect to F5, it presents itself with its name, and the F5 returns the name of the actual server. If both client and server are not on the same domain (“domain” as network domain, no link with Weblogic domain), then the DNS resolution may fail.

To fix this issue, you have to follow one or both of the following points: everything depends on your local topology.

WebLogic: “Listen Address”

Modify the “Listen Address” in WebLogic administration console, from home: Servers > MyFirstNode/MySecondNode > Configuration > General > Listen Address > update it

By “update” the “Listen Address”, I mean providing the complete name of the machines, including the domain extension.
eg: my-first-node.my.domain.extension and my-second-node.my.domain.extension, rather than my-first-node and my-second-node (or, even worse, localhost).
You can also provide an IP, cf. WebLogic documentation on Oracle’s website.

Of course, you can decide to set it directly in WebLogic’s config.xml.

Caution! This option may also be set via the command line running WebLogic, using the flag -Dweblogic.ListenAddress=... Therefore, take care to be consistent between the content of console/config.xml and the command line option.

Hosts

On client side, check the content of hosts file. Usually, you can found it at /etc/hosts (or C:\WINDOWS\system32\drivers\etc\hosts on Windows XP).
Assuming your machine is myClientMachine with an IP 123.123.123.123 and a domain extension remote.domain, then your hosts file should look like:

127.0.0.1       localhost
123.123.123.123  myClientMachine

Update it to:

127.0.0.1       localhost
123.123.123.123  myClientMachine myClientMachine.remote.domain

Posted in en-US | Tags: F5, It is likely that the remote side declared peer gone on this JVM, RMI, Spring, t3, Weblogic, Weblogic 10 | 1 Comment »

Thread leaks in Mule ESB 2.2.1

Author: Jonathan Lalou

Abstract

The application I work on packages Mule ESB 2.2.1 in a WAR and deploys it under a WebLogic 10.3 server. My team mates and I noticed that, on multiple deploy/undeploy cycles, the PermGen size dramatically decreased. The cause of this was the number of threads, which hardly decreased on undeployment phases, unlike the expected behaviour.
Indeed, Mule is seldom deployed as a WebApp. Rather, it is designed to be run as a standalone application, within a Tanuki wrapper. When the JVM is killed, all the threads are killed, too, and therefore no thread survives ; hence, the memory is freed and there is no reason to fear a thread leak.

Moreover, when the application is redeployed, new threads -with the same names as the “old” threads- are created. The risk is that, for any reason, a thread-name-based communication between threads may fail, because the communication pipe may be read by the wrong thread.

In my case: on WebLogic startup, there are 31 threads ; when the application is deployed, there are 150 ; when the application works (receives and handles messages), the number of threads climbs to 800 ; when the application is undeployed, only 12 threads are killed, the other remaining alive.

The question is: how to kill Mule-created threads, in order to avoid a Thread leak?

WebLogic Threads

I performed a thread dump at WebLogic startup. Here are WebLogic threads, created before any deployment occurs:

Attach Listener
DoSManager
DynamicListenThread[Default[1]]
DynamicListenThread[Default]
ExecuteThread: '0' for queue: 'weblogic.socket.Muxer'
ExecuteThread: '1' for queue: 'weblogic.socket.Muxer'
ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'
Finalizer
JMX server connection timeout 42
RMI Scheduler(0)
RMI TCP Accept-0
RMI TCP Connection(1)-127.0.0.1
RMI TCP Connection(2)-127.0.0.1
Reference Handler
Signal Dispatcher
Thread-10
Thread-11
Timer-0
Timer-1
VDE Transaction Processor Thread
[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'
[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'
[STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'
[STANDBY] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'
[STANDBY] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'
[STANDBY] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)'
main
weblogic.GCMonitor
weblogic.cluster.MessageReceiver
weblogic.time.TimeEventGenerator
weblogic.timers.TimerThread

Dispose Disposables, Stop Stoppables…

The application being deployed in a WAR, I created a servlet implementing ServletContextListener. In the method contextDestroyed(), I destroy Mule objects (Disposable, Stoppable, Model, Service, etc.) one per one.
Eg#1:

        final Collection<Model> allModels;
        try {
            allModels = MuleServer.getMuleContext().getRegistry().lookupObjects(Model.class);
            if (LOGGER.isDebugEnabled()) {
                LOGGER.debug("Disposing models " + allModels.size());
            }
            for (Model model : allModels) {
                model.dispose();
            }
            allModels.clear();
        } catch (Exception e) {
            LOGGER.error(e);
        }

Eg#2:

    private void stopStoppables() {
        final Collection<Stoppable> allStoppables;
        try {
            allStoppables = MuleServer.getMuleContext().getRegistry().lookupObjects(Stoppable.class);
            if (LOGGER.isDebugEnabled()) {
                LOGGER.debug("Stopping stoppables " + allStoppables.size());
            }
            for (Stoppable stoppable : allStoppables) {
                stoppable.stop();
            }
            allStoppables.clear();
        } catch (MuleException e) {
            LOGGER.error(e);
        }
    }

This first step is needed because default mechanism is flawed: Mule re-creates objects that were destroyed.

Kill Threads

The general idea to kill Mule threads is the following: perform a Unix-style “diff” between WebLogic native threads, and the threads still alive once all Mule objects have been stopped and disposed.

On Application Startup

In the ServletContextListener, I add a field that will be set in a method called in the constructor:

    private List<String> threadsAtStartup;
(...)
/**
     * This method retrieves the Threads present at startup: mainly speaking, they are Threads related to WebLogic.
     */
    private void retrieveThreadsOnStartup() {
        final Thread[] threads;
        final ThreadGroup threadGroup;
        threadGroup = Thread.currentThread().getThreadGroup();
        try {
            threads = retrieveCurrentActiveThreads(threadGroup);
        } catch (NoSuchFieldException e) {
            LOGGER.error("Could not retrieve initial Threads list. The application may be unstable on shutting down ", e);
            threadsAtStartup = new ArrayList<String>();
            return;
        } catch (IllegalAccessException e) {
            LOGGER.error("Could not retrieve initial Threads list. The application may be unstable on shutting down ", e);
            threadsAtStartup = new ArrayList<String>();
            return;
        }

        threadsAtStartup = new ArrayList<String>(threads.length);
        for (int i = 0; i < threads.length; i++) {
            final Thread thread;
            try {
                thread = threads[i];
                if (null != thread) {
                    threadsAtStartup.add(thread.getName());
                    if (LOGGER.isDebugEnabled()) {
                        LOGGER.debug("This Thread was available at startup: " + thread.getName());
                    }
                }
            } catch (RuntimeException e) {
                LOGGER.error("An error occured on initial Thread statement: ", e);
            }
        }
    }
    /**
     * Hack to retrieve the field ThreadGroup.threads, which is package-protected and therefore not accessible 
     *
     * @param threadGroup
     * @return
     * @throws NoSuchFieldException
     * @throws IllegalAccessException
     */
    private Thread[] retrieveCurrentActiveThreads(ThreadGroup threadGroup) throws NoSuchFieldException, IllegalAccessException {
        final Thread[] threads;
        final Field privateThreadsField;
        privateThreadsField = ThreadGroup.class.getDeclaredField("threads");
        privateThreadsField.setAccessible(true);

        threads = (Thread[]) privateThreadsField.get(threadGroup);
        return threads;
    }

On application shutdown

In the method ServletContextListener.contextDestroyed(), let’s call this method:

    /**
     * Cleanses the Threads on shutdown: theorically, when the WebApp is undeployed, should remain only the threads
     * that were present before the WAR was deployed. Unfornately, Mule leaves alive many threads on shutdown, reducing
     * PermGen size and recreating new threads with the same names as the old ones, inducing a kind of instability.
     */
    private void cleanseThreadsOnShutdown() {
        final Thread[] threads;
        final ThreadGroup threadGroup;
        final String currentThreadName;

        currentThreadName = Thread.currentThread().getName();

        if (LOGGER.isDebugEnabled()) {
            LOGGER.debug("On shutdown, currentThreadName is: " + currentThreadName);
        }

        threadGroup = Thread.currentThread().getThreadGroup();
        try {
            threads = retrieveCurrentActiveThreads(threadGroup);
        } catch (NoSuchFieldException e) {
            LOGGER.error("An error occured on Threads cleaning at shutdown", e);
            return;
        } catch (IllegalAccessException e) {
            LOGGER.error("An error occured on Threads cleaning at shutdown", e);
            return;
        }

        for (Thread thread : threads) {
            final String threadName = thread.getName();
            final Boolean shouldThisThreadBeKilled;

            shouldThisThreadBeKilled = isThisThreadToBeKilled(currentThreadName, threadName);
            if (LOGGER.isDebugEnabled()) {
                LOGGER.info("should the thread named " + threadName + " be killed? " + shouldThisThreadBeKilled);
            }
            if (shouldThisThreadBeKilled) {
                thread.interrupt();
                thread = null;
            }
        }

    }

    /**
     * Says whether a thread is to be killed<br/>
     * Rules:
     * <ul><li>a Thread must NOT be killed if:</li>
     * <ol>
     * <li>it was among the threads available at startup</li>
     * <li>it is a Thread belonging to WebLogic (normally, WebLogic threads are among the list in the previous case</li>
     * <li>it is the current Thread (simple protection against unlikely situation)</li>
     * </ol>
     * <li>a Thread must be killed: in all other cases</li>
     * </ul>
     *
     * @param currentThreadName
     * @param threadName
     * @return
     */
    private Boolean isThisThreadToBeKilled(String currentThreadName, String threadName) {
        final Boolean toBeKilled;
        toBeKilled = !threadsAtStartup.contains(threadName)
                &amp;&amp; !StringUtils.contains(threadName, "weblogic")
                &amp;&amp; !threadName.equalsIgnoreCase(currentThreadName);
        return toBeKilled;
    }

EhCache

My application uses an EhCache. Its threads names usually end with “.data”. They are not killed by the previous actions. To get rid of them, the most elegant way is to add this block in the web.xml:

     <listener>
          <listener-class>net.sf.ehcache.constructs.web.ShutdownListener</listener-class>
     </listener>

cf EhCache documentation

With all these operations, almost all threads are killed. But Java VisualVM still displays 34, vs. 31 at startup.

Tough Threads

A thread dump confirms that, at this point, 3 rebellious threads still refuse to be kill:

MuleServer.1
SocketTimeoutMonitor-Monitor.1
SocketTimeoutMonitor-Monitor.1

Let’s examine them:

MuleServer.1: This thread is an instance of the inner class MuleServer.ShutdownThread. Indeed, this is the first thread created by Mule, and therefore appears among the threads available at startup, before the ServletContextListener is called… I did not succeed in killing it, even why trying to kill it namely, which makes sense: killing the father thread looks like suiciding the ServletContextListener.
SocketTimeoutMonitor-Monitor.1: This thread is created by Mule’s TcpConnector and its daughter classes: HttpConnector, SslConnector, etc. Again, I could not kill them.

Conclusion

We have seen Mule suffers of major thread leaks when deployed as a WAR. Anyway, most of these leaks may be sealed.
I assume MuleSoft was aware of this issue: in the version 3 of Mule, the deployment of webapps was refactored.

Posted in en-US | Tags: architecture, EhCache, Java, Jonathan Lalou, Mule, Mule ESB, servlet, ServletContextListener, Thread, tutorial, Weblogic 10 | 1 Comment »

Failed to start Service “Cluster” (ServiceState=SERVICE_STOPPED, STATE_ANNOUNCE)

Author: Jonathan Lalou

Case

I introduced an Oracle Coherence cache withing my application, which is deployed as a WAR within WebLogic Server. In a first step, I used an instance of Oracle Coherence / Coherence Web already built in WebLogic. Then, for a couple a reasons, I detroyed the Coherence cluster. Deploying the application, the following error appeared:

java.lang.RuntimeException: Failed to start Service "Cluster" (ServiceState=SERVICE_STOPPED, STATE_ANNOUNCE)

Complete Stacktrace

java.lang.RuntimeException: Failed to start Service "Cluster" (ServiceState=SERVICE_STOPPED, STATE_ANNOUNCE)
at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:38)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:38)
at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:366)
at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)
at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:3)

Explanation and Fix

The Coherence cluster view available in WebLogic server is a view of the Tangosol configuration, available in the files tangosol-coherence*.xml of Coherence’s JAR. To fix the issue, create a file tangosol-coherence-override.xml, in your classpath. Fill the file with a minimum content, such as:

<?xml version='1.0'?>
<!DOCTYPE coherence SYSTEM "coherence.dtd">
<coherence>
    <cluster-config>
        <multicast-listener>
            <time-to-live system-property="tangosol.coherence.ttl">0</time-to-live>
            <join-timeout-milliseconds>3000</join-timeout-milliseconds>
        </multicast-listener>
        <unicast-listener>
            <address>127.0.0.1</address>
            <port>8088</port>
            <port-auto-adjust>true</port-auto-adjust>
            <priority>8</priority>
        </unicast-listener>
        <packet-publisher>
            <packet-delivery>
                <timeout-milliseconds>30000</timeout-milliseconds>
            </packet-delivery>
        </packet-publisher>
        <service-guardian>
            <timeout-milliseconds system-property="tangosol.coherence.guard.timeout">35000</timeout-milliseconds>
        </service-guardian>
    </cluster-config>
    <logging-config>
        <severity-level system-property="tangosol.coherence.log.level">5</severity-level>
        <character-limit system-property="tangosol.coherence.log.limit">0</character-limit>
    </logging-config>
</coherence>

Of course, you can amend and improve the file to match your requirements.
NB: in case your file is ignored by Coherence, override the property tangosol.coherence.override with value tangosol-coherence-override.xml.

Posted in en-US | Tags: Coherence*Web, Failed to start Service "Cluster" (ServiceState=SERVICE_STOPPED, Oracle Coherence, STATE_ANNOUNCE), Tangosol, Weblogic 10 | No Comments »

WebLogic: set a System property within a WAR

Author: Jonathan Lalou

Case

You would like to set a System property within an application packaged as a WAR.
Of course, you may modify the launching scripts of your servers, to add an option -DmyPropertyName=myPropertyValue. Sometimes, you would like to avoid such a solution, because updating the property would require an update of the setEnv.* files and therefore a action of the exploitation team.

In my case, I had to set the property tangosol.coherence.cacheconfig, which hints at the configuration file used my Oracle Coherence / Coherence*Web

Fix

The first solution is to set the property in a startup class. For more detials, consult this page: WebLogic: use a startup and/or a shutdown class in a WAR.

Another mean to handle this problematic is to create a servlet, with a code similar to:

public class JonathanBootServlet extends HttpServlet {
    private static final Logger LOGGER = Logger.getLogger(JonathanBootServlet.class);
    private static final String SVN_ID = "$Id$";

    public JonathanBootServlet() {
        super();
        if (LOGGER.isDebugEnabled()){
            LOGGER.debug(SVN_ID);
        }
    }

    public void init(ServletConfig config) throws ServletException {
        if (LOGGER.isDebugEnabled()){
            LOGGER.debug("in init()");
        }
        System.setProperty("tangosol.coherence.cacheconfig", "jonathan-tangosol-coherence.xml");
        super.init(config);
    }
}

Then add the following block in your web.xml:

    <servlet>
        <servlet-name>JonathanBootServlet</servlet-name>
        <servlet-class>lalou.jonathan.weblogic.technical.JonathanBootServlet</servlet-class>
        <load-on-startup>1</load-on-startup>
    </servlet>

Ensure this servlet is run before all others (1).

Posted in en-US | Tags: Coherence*Web, Oracle Coherence, servlet, System property, WAR, Weblogic 10 | No Comments »

java.lang.NoClassDefFoundError: com/tangosol/run/component/EventDeathException

Author: Jonathan Lalou

Case

You have an application which uses Oracle Coherence / Coherence*Web as cache manager. On redeploying the application, you get a stacktrace of which header is:

Exception in thread 'DistributedCache|SERVICE_STOPPED' java.lang.NoClassDefFoundError: com/tangosol/run/component/EventDeathException

Explanation

Indeed, an undeploy operation stops the services of your application. But Coherence is not launched as a service among the others: it should be considered as an independant one. Therefore, you have to stop it before undeploying your application ; otherwise, your classloader is in a confused state, and loses references to classes it had loaded before, but that should now be unloaded.

Fix

You have to stop Oracle Coherence on your application shutdown.

EAR

If your application is packaged as an EAR, then create a shutdown class with the following main():

public class JonathanLalouEARShutdown extends ApplicationLifecycleListener {
   public static void main(String[] args) {
   CacheFactory.shutdown();
   }
}

Add the following block in your weblogic-application.xml, hinting the shutdown class:

<shutdown>
    <shutdown-class>lalou.jonathan.weblogic.JonathanLalouEARShutdown </shutdown-class>
</shutdown>

WAR

If your application is packaged as a WAR, then create a class implementing ServletContextListener, and add the following block in your web.xml:

<listener>
    <listener-class>your.class.implementing.ServletContextListener</listener-class>
</listener>

The detailed procedure is described at this link: WebLogic: use a startup and/or a shutdown class in a WAR

Posted in en-US | Tags: Coherence*Web, EventDeathException, NoClassDefFoundError, Oracle Coherence, Tangosol, Weblogic 10 | No Comments »

S	M	T	W	T	F	S
« Jun
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31