Uploaded image for project: 'PUBLIC - Liferay Portal Community Edition'
  1. PUBLIC - Liferay Portal Community Edition
  2. LPS-97074

Improve PerFieldAnalyzer performance with caching

    Details

      Description

      7.x versions are not affected, we no longer have this code in place.


      Thread dumps revealed many CPU heavy threads. The reason is that PerFieldAnalyzer is not caching the previously found analyzers, resulting in always repeating the pattern matching for ddm and localized fields.

      An example of CPU heavy tread:

         java.lang.Thread.State: RUNNABLE
      	at java.lang.Character.codePointAt(Character.java:4668)
      	at java.util.regex.Pattern$CharProperty.match(Pattern.java:3693)
      	at java.util.regex.Pattern$Curly.match0(Pattern.java:4158)
      	at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
      	at java.util.regex.Matcher.match(Matcher.java:1221)
      	at java.util.regex.Matcher.matches(Matcher.java:559)
      	at java.util.regex.Pattern.matches(Pattern.java:1130)
      	at com.liferay.portal.search.lucene.PerFieldAnalyzer.getAnalyzer(PerFieldAnalyzer.java:63)
      	at com.liferay.portal.search.lucene.LuceneHelperImpl.isLikeField(LuceneHelperImpl.java:620)
      	at com.liferay.portal.search.lucene.LuceneHelperImpl.addTerm(LuceneHelperImpl.java:235)
      	at com.liferay.portal.search.lucene.LuceneHelperImpl.addTerm(LuceneHelperImpl.java:220)
      	at com.liferay.portal.search.lucene.LuceneHelperUtil.addTerm(LuceneHelperUtil.java:268)
      	at com.liferay.portal.search.lucene.LuceneHelperUtil.addTerm(LuceneHelperUtil.java:262)
      	at com.liferay.portal.search.lucene.LuceneHelperUtil.addTerm(LuceneHelperUtil.java:256)
      	at com.liferay.portal.search.lucene.BooleanQueryImpl.addTerm(BooleanQueryImpl.java:265)
      	at com.liferay.portal.kernel.search.BaseIndexer.addSearchClassTypeIds(BaseIndexer.java:990)
      	at com.liferay.portlet.journal.util.JournalArticleIndexer.postProcessContextQuery(JournalArticleIndexer.java:163)
      

      The problem is that we return the analyzer after successful matching (Pattern.matches(key, fieldName)) but we don't cache it, thus the whole pattern matching (including the costly regexp compilation) is always repeated.

      PerFieldAnalyzer.java
      	public Analyzer getAnalyzer(String fieldName) {
      		Analyzer analyzer = _analyzers.get(fieldName);
      
      		if (analyzer != null) {
      			return analyzer;
      		}
      
      		for (String key : _analyzers.keySet()) {
      			if (Pattern.matches(key, fieldName)) {
      				return _analyzers.get(key);
      			}
      		}
      
      		return _analyzer;
      	}
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                brian.lee Brian Lee
                Reporter:
                istvan.sajtos Istvan Sajtos
                Participants of an Issue:
                Recent user:
                Brian Lee
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Days since last comment:
                  25 weeks, 2 days ago

                  Packages

                  Version Package
                  6.2.X EE