Uploaded image for project: 'PUBLIC - Liferay Portal Community Edition'
  1. PUBLIC - Liferay Portal Community Edition
  2. LPS-67329

Identify performance bottlenecks during an export and import process



      The purpose of the story is to measure export and import executions and identify areas that can be improved.

      The output of the story is to have a spreadsheet with the measurements and conclusions as well as the areas need to be improved. The story also needs to validate the necessity of the usage of the topological order algorithm. The algorithm in theory could speed up the staging process but it needs to be validated. This story can validate the theory and in subsequent stories we can implement the changes.


      The numeric results of the performance tests can be found here: https://docs.google.com/spreadsheets/d/1XWCno85XQSLz-h4LzY2h5csxTSCsdY7Clwkk-pexlLQ/edit?ts=57989f7b

      The key findings are

      1. The controlling overhead is generally small and constant, so we can't really make any significant performance gain here
      2. The export process performance is acceptable, and when it is slower it is because the entity specific logic we can't change (DL API is generally slower then Journal, etc... ), however further tests could find possible improvements here, because it seems complex scenarios can cause longer export time then import
      3. The import process is generally slower then export (1-10 times slower compared to the export time of the same elements)
      4. The import process is not linearly slower with the file size (with small ones the gains is 10-20%, howver in case of large lar files the gain is 4-500%)
      5. The import process requires a lot of direct access (currently with XPath) to the elements of the many XML files we have in the lar


      1. We need to improve the import performance
        • To achive that we need to either make the direct access of the XML faster, or remove the need of the direct access
        • We decided to get rid of the need of direct access
      2. We need to develop the so called Topological Ordering
        • To do that we need to first change the following things
          • We need to homogenize the entity graph staging processes
          • We need to develop Staged Model Repositories for all the Staged Model
          • We need to refactor the controlling logic (the current one is polluted with entity specific logic)
          • We need to refactor and split the serialization from the other logic

      What is topological sorting and what does it mean for us?

      For scientific explanation please see Wikipedia

      For staging it means if we were able to generate one possible topological order of the entities of a publication then we could avoid to use the really complex and slow reference mechanism that we have in place right now. Based on the performance test we can clearly see that every import process is slow, and even not linearly slow because of the really expensive DOM processing required by the reference handling. So with this on the import side we can eliminate ALL the XML processing time added by the reference processing which -based on the exported content nature - can be 50-80% of all the process time! And it could be even faster in the future, since certain parts of a topological order of elements can be processed parallel, but it has several requirements even from the Platform side.

      Next Steps

      • Mate is going to create Epics/User Stories and attach to this one to track the required changes
      • We should do further tests regarding complex scenarios export time




            maria.kispal Mária Kispál (Inactive)
            mate.thurzo Mate Thurzo (Inactive)
            Recent user:
            Kiyoshi Lee
            Participants of an Issue:
            0 Vote for this issue
            0 Start watching this issue




                Version Package
                7.0.0 M6