-
Type:
Bug
-
Status: Closed
-
Resolution: Fixed
-
Affects Version/s: 6.0.12 EE, 6.1.30 EE GA3, 6.2.10 EE GA1, Master
-
Fix Version/s: 6.2.X EE, 7.0.0 DXP FP11, 7.0.0 DXP SP2, 7.0.3 CE GA4, 7.1.X, Master
-
Component/s: Fault Tolerance, Fault Tolerance > Clustering Framework
-
Branch Version/s:7.0.x, 6.2.x
-
Backported to Branch:Committed
-
Git Pull Request:
We did encounter situation when server was halted.
Several thread dumps there were most of the Liferay thread were having this stack trace :
"liferay-116" prio=10 tid=0x00007f1c80379800 nid=0x1341 waiting on condition [0x00007f1d24cd7000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000695ab7ad8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireInterruptibly(AbstractQueuedSynchronizer.java:894) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1221) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at com.liferay.portal.kernel.concurrent.CoalescedPipe.put(CoalescedPipe.java:55) at com.liferay.portal.kernel.cache.cluster.BasePortalCacheClusterChannel.sendEvent(BasePortalCacheClusterChannel.java:116) at com.liferay.portal.kernel.cache.cluster.PortalCacheClusterLink.sendEvent(PortalCacheClusterLink.java:58) at com.liferay.portal.kernel.cache.cluster.PortalCacheClusterLinkUtil.sendEvent(PortalCacheClusterLinkUtil.java:63) at com.liferay.portal.cache.cluster.EhcachePortalCacheClusterReplicator.notifyElementRemoved(EhcachePortalCacheClusterReplicator.java:120) at net.sf.ehcache.event.RegisteredEventListeners.internalNotifyElementRemoved(RegisteredEventListeners.java:157) at net.sf.ehcache.event.RegisteredEventListeners.notifyElementRemoved(RegisteredEventListeners.java:137) at net.sf.ehcache.Cache.notifyRemoveInternalListeners(Cache.java:2410) at net.sf.ehcache.Cache.removeInternal(Cache.java:2393) at net.sf.ehcache.Cache.remove(Cache.java:2295) at net.sf.ehcache.Cache.remove(Cache.java:2213) at net.sf.ehcache.Cache.remove(Cache.java:2191) at com.liferay.portal.cache.ehcache.EhcachePortalCache.remove(EhcachePortalCache.java:131)
One thread was having following:
"liferay-117" prio=10 tid=0x00007f1c8036c800 nid=0x1342 runnable [0x00007f1d24ba1000] java.lang.Thread.State: RUNNABLE at com.liferay.portal.kernel.util.Validator.equals(Validator.java:164) at com.liferay.portal.kernel.cache.cluster.PortalCacheClusterEventCoalesceComparator.compare(PortalCacheClusterEventCoalesceComparator.java:38) at com.liferay.portal.kernel.cache.cluster.PortalCacheClusterEventCoalesceComparator.compare(PortalCacheClusterEventCoalesceComparator.java:1) at com.liferay.portal.kernel.concurrent.CoalescedPipe._coalesceElement(CoalescedPipe.java:151) at com.liferay.portal.kernel.concurrent.CoalescedPipe.put(CoalescedPipe.java:58) at com.liferay.portal.kernel.cache.cluster.BasePortalCacheClusterChannel.sendEvent(BasePortalCacheClusterChannel.java:116) at com.liferay.portal.kernel.cache.cluster.PortalCacheClusterLink.sendEvent(PortalCacheClusterLink.java:58) at com.liferay.portal.kernel.cache.cluster.PortalCacheClusterLinkUtil.sendEvent(PortalCacheClusterLinkUtil.java:63) at com.liferay.portal.cache.cluster.EhcachePortalCacheClusterReplicator.notifyElementRemoved(EhcachePortalCacheClusterReplicator.java:120) at net.sf.ehcache.event.RegisteredEventListeners.internalNotifyElementRemoved(RegisteredEventListeners.java:157) at net.sf.ehcache.event.RegisteredEventListeners.notifyElementRemoved(RegisteredEventListeners.java:137) at net.sf.ehcache.Cache.notifyRemoveInternalListeners(Cache.java:2410) at net.sf.ehcache.Cache.removeInternal(Cache.java:2393) at net.sf.ehcache.Cache.remove(Cache.java:2295) at net.sf.ehcache.Cache.remove(Cache.java:2213) at net.sf.ehcache.Cache.remove(Cache.java:2191) at com.liferay.portal.cache.ehcache.EhcachePortalCache.remove(EhcachePortalCache.java:131)
and one thread was having following:
"PortalCacheClusterChannel dispatch thread-0" prio=10 tid=0x00007f1e3a565800 nid=0x32d6 waiting on condition [0x00007f1c4f8f7000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000695ab7738> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireInterruptibly(AbstractQueuedSynchronizer.java:894) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1221) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at com.liferay.portal.kernel.concurrent.CoalescedPipe.take(CoalescedPipe.java:87) at com.liferay.portal.kernel.cache.cluster.BasePortalCacheClusterChannel.run(BasePortalCacheClusterChannel.java:81) at java.lang.Thread.run(Thread.java:745)
and one thread was having followig:
"PortalCacheClusterChannel dispatch thread-1" prio=10 tid=0x00007f1e0006e000 nid=0x32f3 waiting on condition [0x00007f1c4fffe000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000006c0d29030> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at com.liferay.portal.kernel.concurrent.CoalescedPipe.take(CoalescedPipe.java:91) at com.liferay.portal.kernel.cache.cluster.BasePortalCacheClusterChannel.run(BasePortalCacheClusterChannel.java:81) at java.lang.Thread.run(Thread.java:745)
The cause seems to be that CoalescedPipe implementation queue was filling up fast and from the dump we see that there are 16542 objects (See the image) The reason seems to be that when the queue size is growing the following call will be very slow:
if (_coalesceElement(e)) { return; }