Uploaded image for project: 'PUBLIC - Liferay Portal Community Edition'
  1. PUBLIC - Liferay Portal Community Edition
  2. LPS-27724

Solr-web-plugin doesn't get snippets from the localized fields and provides incosistent results in case of searches from other locales for terms occured only in the default-locale version of a Web Content

    Details

      Description

      Solr-web-plugin currently is not designed to get snippets from the localised content fields from the result SolrDocument (field "highlights").

      By default, only the field "content" is declared as stored and indexed field in the schema.xml of solr-web-plugin. However, if a customer creates a translation for a given web content, the localised versions of the fields (sush as "content" --> "content_xx_XX") are saved in the Document stored by Solr. Therefore, only searches from the default-locale will be succeed.

      Adding the new field(s) (e.g. "content_en_US" and "content_el_GR" to the "schema.xml" solves the problem above, but the results never be highlighted. It's because the method "SolrIndexSearcherImpl.getSnippet()" reads the snippets only from field "content" (even if the localised versions also exist). Because, the field "content" always stores the content corresponding to the current default-locale, results of searches from the default-locale always be highlighted.

      By default, Solr generates snippets for the field declared in the element "<defaultSearchField>" in the "schema.xml". It is a fallback field and deprecated from solr-3.6.0. Solr uses it when no field(s) is/are passed as param(s) "hl.fl" to the given solrQuery.

      There are two ways to tell Solr explicitly for which fields to generate highlighted snippets:
      1. Adding a param to the "SolrQuery" when executing the method "SolrIndexSearcherImpl.translateQuery()" :
      e.g. solrQuery.setParam("hl.fl", "content_el_GR");

      2. Adding the following line to the "requestHandler" called "standard" (under solr-1.4.1, or "search" under solr-3.5.0) in the "solrconfig.xml":
      e.g. <str name="hl.fl">content,content_en_US,content_el_GR</str>

      --> Thus, if a customer is planning to support translations, they have to modify the "schema.xml" and take into account the special characteristics of the given language. Different languages may need different "tokanizer" and/or "filter(s)" hence new "fieldType" declaration in the "schema.xml".

      The other problem is that search from other locales for a term, which occurs only in the default-locale version, generates confusing result in case of Web Contents: there will be result, but the portal will display the localized version of the content, which obviously does not contain the term! No highlight, no term in the displayed content. It's because the field "content" is always part of the query, thus, Solr will find matches, and Solr-web-plugin will pass the Document (built from the result SolrDocument) to the portal, where the displaying process tries to highlight in the localized field (without success, of course). Unfortunately, Lucene (LuceneIndexSearcherImpl) works like this, too. (Tested on trunk)

        Attachments

        1. lang.zip
          40 kB
        2. schema-1.4.1.xml
          6 kB
        3. schema-3.5.0.xml
          6 kB
        4. solrconfig-1.4.1.xml
          45 kB
        5. solrconfig-3.5.0.xml
          60 kB
        6. test.content_el_GR.txt
          13 kB
        7. test.content_en_US.txt
          10 kB

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Days since last comment:
                  6 years, 39 weeks, 3 days ago

                  Packages

                  Version Package
                  6.1.1 CE GA2
                  6.1.20 EE GA2
                  6.2.0 CE M2