Uploaded image for project: 'PUBLIC - Liferay Portal Community Edition'
  1. PUBLIC - Liferay Portal Community Edition
  2. LPS-97116

Liferay clustering with jGroups has been bound to HTTP thread and can cause severe cluster instability

    Details

      Description

      Overview

      Liferay clustering jGroups communication problems can cause significant slowdown the application server HTTP thread, which can lead up situation that cluster is not responding.

      My customer's huge Liferay cluster did slow down so much that it is none of the nodes were responding for traffic. On this case one of the nodes where the root cause and when that node was killed then cluster did heal itself. Still, meanwhile, our site wasn't able to serve any traffic for a significant period of time.

      By examining thread dumps from that period, the situation was tracked down to jGroups.send(..) and that method was called from the thread that was responding the user Http request.

      This ticket describes how to simulate this kind of situation:

      Prepare for the simulated test

      Create a Liferay DXP 7.2 ( or EE 6.2 clustered fixpack -69 ) environment with Tomcat. (EE 6.2 might require little changes on Byteman script)

      Install byteman: https://byteman.jboss.org/downloads.html under `<tomcat>/byteman` directory

      Modify <tomcat>/bin/setenv.sh script and add following there at end:

      if [ -f "${CATALINA_HOME}/byteman/lib/byteman.jar" ]; then
      	BYTEMAN_HOME="${CATALINA_HOME}/byteman"
      
      	CATALINA_OPTS="${CATALINA_OPTS} -Dorg.jboss.byteman.transform.all"
      	CATALINA_OPTS="${CATALINA_OPTS} -Dorg.jboss.byteman.allow.config.updates"
      
      	if [ -f "${CATALINA_HOME}/default.btm" ]; then
      		echo "Byteman Found with script ${CATALINA_HOME}/default.btm"
      		CATALINA_OPTS="${CATALINA_OPTS} -javaagent:${BYTEMAN_HOME}/lib/byteman.jar=script:${CATALINA_HOME}/default.btm,boot:${BYTEMAN_HOME}/lib/byteman.jar,listener:true"
      	else
      		echo "Byteman Found"
      		CATALINA_OPTS="${CATALINA_OPTS} -javaagent:${BYTEMAN_HOME}/lib/byteman.jar=sys:${BYTEMAN_HOME}/lib/byteman.jar,listener:true"
      	fi
      else
      	echo "Booting without ${CATALINA_HOME}/byteman"
      fi
      

      With DXP 7.2 / CE 7.2 add to portal-ext.properties (if that exist then make sure to add org.jboss.byteman.* to last.

      module.framework.properties.org.osgi.framework.bootdelegation=\
          __redirected,\
          com.liferay.aspectj,\
          com.liferay.aspectj.*,\
          com.liferay.expando.kernel.model,\
          com.liferay.portal.servlet.delegate,\
          com.liferay.portal.servlet.delegate*,\
          com.sun.ccpp,\
          com.sun.ccpp.*,\
          com.sun.crypto.*,\
          com.sun.image.*,\
          com.sun.jmx.*,\
          com.sun.jna,\
          com.sun.jndi.*,\
          com.sun.mail.*,\
          com.sun.management.*,\
          com.sun.media.*,\
          com.sun.msv.*,\
          com.sun.org.*,\
          com.sun.syndication,\
          com.sun.tools.*,\
          com.sun.xml.*,\
          com.yourkit.*,\
          javax.validation,\
          javax.validation.*,\
          jdk.*,\
          sun.*,\
          weblogic.jndi,\
          weblogic.jndi.*,\
          org.jboss.byteman.*
      

      Create Byteman rule file <tomcat>/default.btm file with content of following (File is created for 7.2 so it might vary for other versions).

      # Prints log entry when JChannel.close is called
      RULE org.jgroups.JChannel.close
      CLASS org.jgroups.JChannel
      METHOD close
      AT ENTRY
      BIND
      channel:org.jgroups.JChannel = $0;
      IF true
      DO
      #We open trace every time just in case. It won't re-open it.
      traceOpen("jgroups_log","jgroups.log");
      traceln("jgroups_log","Closing JChannel: " + channel.cluster_name);
      ENDRULE
      
      # Prints log entry when JChannel.connect(String) is called
      RULE org.jgroups.JChannel.connect
      CLASS org.jgroups.JChannel
      METHOD connect(String)
      AT ENTRY
      BIND
      threadId = ""+Thread.currentThread().getId() + "/" + Thread.currentThread().getName();
      IF true
      DO
      # We open trace every time just in case. It won't re-open it.
      traceOpen("jgroups_log","jgroups.log");
      traceln("jgroups_log", threadId + "\t" + new java.util.Date() + "\tConnect JChannel: " + $1);
      ENDRULE
      
      # Prints log entry when JChannel.send(org.jgroups.Message) is called
      RULE org.jgroups.JChannel.send
      CLASS org.jgroups.JChannel
      METHOD send(org.jgroups.Message)
      AT ENTRY
      BIND
      threadId = ""+Thread.currentThread().getId() + "/" + Thread.currentThread().getName();
      channelName = $0.cluster_name + "";
      # Tap rule only to Http thread (Sleep also 5000 ms)
      IF formatStack().indexOf("com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain") > 0 
      DO Thread.sleep(5000);
      traceln("jgroups_log", threadId + "\t" + new java.util.Date() + "\t" + channelName + "\nSend:\nSTACKTRACE\n" + formatStack());
      ENDRULE
      

      Test

      1. Start the portal

      2. Add user and when you save the user information you notice that it is slow and take around 15 seconds (due to the Thread.sleep at Byteman script).

      3. If you add a role to user that is taking around 5+ seconds due to the same reason.

      4. You can see from <liferay-home>/jgroups.log places at the code where this has been called. Adding user is blocking 3 times and updating role once.

      Summary

      Cluster communication should be isolated from HTTP thread to increase Liferay's stability on exceptional cases when jGroups is slowing down.

        Attachments

          Activity

            People

            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since last comment:
                8 weeks, 3 days ago

                Packages

                Version Package
                7.0.0 DXP FP86
                7.0.X
                7.1.10 DXP FP14
                7.1.X
                7.2.10 DXP FP2
                7.2.10.1 DXP SP1
                7.2.X
                Master