Uploaded image for project: 'PUBLIC - Liferay Portal Community Edition'
  1. PUBLIC - Liferay Portal Community Edition
  2. LPS-74642

Search does not provide accurate results and highlights for Japanese content neither on Elasticsearch nor on Solr - DM

    Details

    • Fix Priority:
      4

      Description

      Similar for WCM: See LPS-78119.

      Steps to Reproduce - master/7.0.x

      1. Start Liferay
      2. Set Japenese as the portal's default language: Control Panel - Configuration - Instance Settings - Misc - Default Language
      3. Upload Document1.txt Document2.txt Document3.txt to Documents and Media
      4. Upload 組織情報B.pdf to Documents and Media with the following metadata:
        1. Title: サンプルB
        2. Description: これは東京都品川区で登録したファイルです
      5. Add Language Selector portlet to the default page
      6. Switch to Japanese locale

      Searching & Results

      a. Searched for the string English Japanese

      • SEARCH RESULT: PASS - Document3.txt is displayed
      • HIGHLIGHT: PASS - Both English and Japanese are highlighted as expected in Document3.txt.

      b. Searched for the string あいうえお 日本語 (aiueo nihongo)

      • SEARCH RESULT: PASS - Both Document1.txt and Document2.txt are available in the search results.
      • HIGHLIGHT: FAIL - Partially working. 日本語 was highlighted as expected, but あいえうお is NOT highlighted. Strangely, only あい is highlighted.

      c. Searching for partial strings such as あいう (aiu)

      • SEARCH RESULT: FAIL - No search results are present, even though it is expected that あいう is present in Document1.txt
      • HIGHLIGHT: FAIL - Since there are no search results, nothing can be highlighted.

      d. Search for サンプル (sampuru)

      • SEARCH RESULT: PASS - Document サンプルB is visible.
      • HIGHLIGHT: PASS - Only text that says サンプル is highlighted.

      e. Search for 推進 (suishin)

      • SEARCH RESULT: PASS - Document is visible
      • HIGHLIGHT: PASS - The string is highlighted in the DM result.

      f. Search for 推進部 (suishinbu)

      • SEARCH RESULT: PASS - Document is visible
      • HIGHLIGHT: FAIL - The string is highlighted in the DM result, but strangely enough, the last character of the string is highlighted also.

      g. Search for 品川区 (shinagawaku)

      • SEARCH RESULT: PASS - DM result is found.
      • HIGHLIGHT: FAIL - The document contains this string in the actual content, but because the description is the only thing that's displayed, no highlights are present.

      Reproduced on master@b7df384c4f71832b3afe5f67d65d3a641d1bbee3
      Reproduced with Remote Elasticsearch 2.4.x
      Reproduced with Solr: https://dev.liferay.com/discover/deployment/-/knowledge_base/7-0/using-solr - Tested with Liferay Solr 5 Search Engine 1.0.0

      • It does not work either if you change the assigned analyzer to text_ja for fields content, description, subtitle, title in schema.xml and hit a reindex either.
        Most probably it affects other assets as well

        Attachments

        1. 1b.png
          1b.png
          29 kB
        2. aiueo.png
          aiueo.png
          521 kB
        3. Document1.txt
          0.0 kB
        4. Document2.txt
          0.0 kB
        5. Document3.txt
          0.0 kB
        6. nagoya.png
          nagoya.png
          101 kB
        7. suishin.png
          suishin.png
          140 kB
        8. suishinbu.png
          suishinbu.png
          146 kB
        9. 組織情報B.pdf
          22 kB

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Days since last comment:
                  1 year, 45 weeks, 5 days ago

                  Packages

                  Version Package