Uploaded image for project: 'PUBLIC - Liferay Portal Community Edition'
  1. PUBLIC - Liferay Portal Community Edition
  2. LPS-67124

Search terms are analyzed twice in case of using Solr engine

    Details

      Description

      Solution & QA Notes
      The following portal property has been added to 6.2 besides the code changes:

          #
          # Set this to true to enable
          # com.liferay.portal.search.lucene.PerFieldAnalyzer to analyze index fields
          # both at index and query time using the mappings defined in
          # META-INF/search-spring.xml.
          #
          # Setting this to false is recommended when using a pluggable enterprise
          # search engine with its own mappings (e.g. as Solr) to avoid indexing
          # terms twice which may result in incorrect tokenization.
          #
          # Deprecated as of 7.0.0.
          #
          index.portal.field.analyzer.enabled=true
      

      Liferay Solr 4 Search Engine 2.1.1 (https://web.liferay.com/marketplace/-/mp/application/30365680) sets this property to "false" by default.


      Steps to reproduce (with groovy script)

      1. Configure Liferay with Solr 4
        1. Reproduced with Liferay Solr 4 Search Engine 2.1.0
      2. Create a new site
      3. Configure the site with only Spanish language (es_ES)
      4. Go to webcontent and create two basic webcontents:
        • First WC: Title=>"plantea" Content=> empty
        • Second WC: Title=>"planteo" Content=> empty
      5. Execute attached groovy script from control panel: queryIndexLPS-67124.groovy (the script executes search title_es_ES:planteo)
        • Expected behavior: Both webcontents with "plantea" and "planteo" titles are found
        • Wrong behavior: No webcontent is found
      6. Configure Liferay with Lucene and repeat steps 1 to 5: the behaviour with Lucene should be correct

      Steps to reproduce (with search portlet)

      1. Configure Liferay with Solr 4
        1. Reproduced with Liferay Solr 4 Search Engine 2.1.0
      2. Create a new site
      3. Configure the site with only Spanish language (es_ES)
      4. Go to webcontent and create two basic webcontents:
        • First WC: Title=>"plantea" Content=> empty
        • Second WC: Title=>"planteo" Content=> empty
      5. Add a page with search portlet
      6. Execute a search with keyword: "plantea"
        • Expected behavior: Both webcontents with "plantea" and "planteo" titles are found (es_ES tokenizer removes last 'a' / 'o' as it is genre termination of the word)
        • Wrong behavior: Only the webcontent with "plantea" title is found
      7. Check query at Solr server:
        • Expected behavior: Query is executed with 'plante' word (es_ES tokenizer removes last 'a' / 'o' as it is genre termination of the word)
        • Wrong behavior: Query is executed with 'plant' word
      8. Configure Liferay with Lucene and repeat steps 1 to 6: the behaviour with Lucene should be correct

      Technical Background

      Result of search term analysis is not consistent:

      • During indexation process: content is analyzed at Search engine (Lucene or Solr)
      • During search process: query is analyzed in Liferay code before sending it to Search engine, but after that, Solr analyzes again the query

      Most of the times, analysis of a term more than one time is not problematic, but for example in spanish there are words ending in more than one vowel that fails, for example: plantea or planteo:

      • Lucene:
        • Indexation with _es_ES tokenizer: plantea ==> (solr) plante
        • Search query with _es_ES tokenizer: plantea ==> (liferay) plante
        • plante == plante
      • SOLR:
        • Indexation with _es_ES tokenizer: plantea ==> (solr) plante
        • Search query with _es_ES tokenizer: plantea ==> (liferay) plante ==> (solr) ==> plant
        • plante != plant

      As a solution, we shouldn't parse the query in liferay when using solr


      master and 7.0.x are fixed because Lucene was removed from the core and the search functionality was rewritten in modules so take advantage of the query dialect of the underlying search engines (Elasticsearch, Solr)

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Days since last comment:
                  3 years, 6 days ago

                  Packages

                  Version Package
                  6.2.X EE
                  7.0.0 DXP FP8
                  7.0.3 CE GA4