Uploaded image for project: 'PUBLIC - Liferay Portal Community Edition'
  1. PUBLIC - Liferay Portal Community Edition
  2. LPS-70512

Cache stops working properly when certain nodes in a cluster become idle

    Details

    • Story Points:
      8
    • Fix Priority:
      3

      Description

      Description
      Cache stops working properly, with continuous misses, which causes a high number of queries to the database.

      For our reproduced test case, this occurred for the cache:
      com.liferay.portal.kernel.dao.orm.FinderCache.com.liferay.portal.model.impl.GroupImpl

      Steps to Reproduce

      1. Setup a two node cluster with Liferay sp15
      2. Setup JMX to monitor both nodes using VisualVM/JConsole
        1. References:
      3. Start both nodes
      4. Begin monitoring cache statistics on JMX
        1. With Visual VM or JConsole, navigate to MBeans (VisualVM would need to install the plugin VisualVM-MBeans)
        2. Navigate to net.sf.ehcache/CacheStatistics/liferay-multi-vm
        3. Navigate to com.liferay.portal.kernel.dao.orm.FinderCache.com.liferay.portal.model.impl.GroupImpl
        4. Monitor the attributes:
          • InMemoryHits
          • InMemoryMisses
        5. References:
      5. Make sure there are no open browsers viewing either nodes, otherwise close them to be safe.
      6. Open a browser to node 1 (e.g. http://localhost:8080)
        1. Refresh browser at least once, before 10 minutes (GroupImpl cache expires after 600 seconds by default).
          • Note: Multiple refreshes, throughout the 10 minutes, would be suggested to ensure the cache entry does not get expired.
      7. Wait at least 10 minutes (GroupImpl cache expires after 600 seconds by default)
      8. Open another browser to node 2 (GroupImpl cache should have expired for node2, but not node1)
      9. Make sure page finishes loading for node2
      10. Switch between both browsers and refresh the page
        1. For example:
          1. Switch to the browser viewing node 1 and refresh
          2. Switch to the browser viewing node 2 and refresh
          3. Repeat many times
      11. Observe JMX monitoring

      Expected Results:
      The cache for com.liferay.portal.kernel.dao.orm.FinderCache.com.liferay.portal.model.impl.GroupImpl should have a small number of misses.

      Actual Results:
      The cache for com.liferay.portal.kernel.dao.orm.FinderCache.com.liferay.portal.model.impl.GroupImpl continues to increase as we switch between both browsers and refresh the page.

      Technical Explanation:

      When Liferay fetches from the finder cache using a key and misses due to an expired entry (we'll call this entry A), it will re-populate both entity cache and all unique result finder caches rather than just the result that missed (for simplicity, we'll assume there's just one and we'll call it entry B). Because entry B is likely to still be in cache (but marked as expired), what happens is the cache will have a PUT for entry A and an UPDATE for entry B, because when determining whether it's a PUT vs. UPDATE, the cache doesn't check if the entry it's replacing is expired, simply that it's present.

      Liferay's default replication mechanism is to not replicate puts, to replicate updates, and to replicate updates without copy (so all cluster members to remove their entries from cache). So, entry B will remove as a result of the broadcast, but the cache will still have entry A as it hasn't expired on the other nodes yet. So, when the other nodes miss on entry B and refill the unique result caches, entry A will be an UPDATE and entry B will be a PUT, causing all nodes to lose entry A while keeping entry B.

      This continues until some node broadcasts an UPDATE for all entries (most likely due to transactional cache flushing multiple things at once). Until that time, the constant cache missing will cause the database to be hammered retrieving these entries.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                2 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Days since last comment:
                  2 years, 28 weeks, 5 days ago

                  Packages

                  Version Package