Uploaded image for project: 'PUBLIC - Liferay Portal Community Edition'
  1. PUBLIC - Liferay Portal Community Edition
  2. LPS-74642

Search does not provide accurate results and highlights for Japanese content neither on Elasticsearch nor on Solr - DM


    Similar for WCM: See LPS-78119.

    Steps to Reproduce - master/7.0.x

    1. Start Liferay
    2. Set Japenese as the portal's default language: Control Panel - Configuration - Instance Settings - Misc - Default Language
    3. Upload Document1.txt Document2.txt Document3.txt to Documents and Media
    4. Upload 組織情報B.pdf to Documents and Media with the following metadata:
      1. Title: サンプルB
      2. Description: これは東京都品川区で登録したファイルです
    5. Add Language Selector portlet to the default page
    6. Switch to Japanese locale

    Searching & Results

    a. Searched for the string English Japanese

    • SEARCH RESULT: PASS - Document3.txt is displayed
    • HIGHLIGHT: PASS - Both English and Japanese are highlighted as expected in Document3.txt.

    b. Searched for the string あいうえお 日本語 (aiueo nihongo)

    • SEARCH RESULT: PASS - Both Document1.txt and Document2.txt are available in the search results.
    • HIGHLIGHT: FAIL - Partially working. 日本語 was highlighted as expected, but あいえうお is NOT highlighted. Strangely, only あい is highlighted.

    c. Searching for partial strings such as あいう (aiu)

    • SEARCH RESULT: FAIL - No search results are present, even though it is expected that あいう is present in Document1.txt
    • HIGHLIGHT: FAIL - Since there are no search results, nothing can be highlighted.

    d. Search for サンプル (sampuru)

    • SEARCH RESULT: PASS - Document サンプルB is visible.
    • HIGHLIGHT: PASS - Only text that says サンプル is highlighted.

    e. Search for 推進 (suishin)

    • SEARCH RESULT: PASS - Document is visible
    • HIGHLIGHT: PASS - The string is highlighted in the DM result.

    f. Search for 推進部 (suishinbu)

    • SEARCH RESULT: PASS - Document is visible
    • HIGHLIGHT: FAIL - The string is highlighted in the DM result, but strangely enough, the last character of the string is highlighted also.

    g. Search for 品川区 (shinagawaku)

    • SEARCH RESULT: PASS - DM result is found.
    • HIGHLIGHT: FAIL - The document contains this string in the actual content, but because the description is the only thing that's displayed, no highlights are present.

    Reproduced on [email protected]
    Reproduced with Remote Elasticsearch 2.4.x
    Reproduced with Solr: https://dev.liferay.com/discover/deployment/-/knowledge_base/7-0/using-solr - Tested with Liferay Solr 5 Search Engine 1.0.0

    • It does not work either if you change the assigned analyzer to text_ja for fields content, description, subtitle, title in schema.xml and hit a reindex either.
      Most probably it affects other assets as well


      1. 1b.png
        29 kB
      2. aiueo.png
        521 kB
      3. Document1.txt
        0.0 kB
      4. Document2.txt
        0.0 kB
      5. Document3.txt
        0.0 kB
      6. nagoya.png
        101 kB
      7. suishin.png
        140 kB
      8. suishinbu.png
        146 kB
      9. 組織情報B.pdf
        22 kB

      Issue Links



            support-lep@liferay.com SE Support
            tibor.lipusz Tibor Lipusz
            Kiyoshi Lee Kiyoshi Lee
            0 Vote for this issue
            8 Start watching this issue


              5 years, 4 weeks, 2 days ago
              Development End Date:


                Version Package