Search does not provide accurate results and highlights for Japanese content neither on Elasticsearch nor on Solr - DM


    Similar for WCM: See LPS-78119.

    Steps to Reproduce - master/7.0.x

    1. Start Liferay
    2. Set Japenese as the portal's default language: Control Panel - Configuration - Instance Settings - Misc - Default Language
    3. Upload Document1.txt Document2.txt Document3.txt to Documents and Media
    4. Upload 組織情報B.pdf to Documents and Media with the following metadata:
      1. Title: サンプルB
      2. Description: これは東京都品川区で登録したファイルです
    5. Add Language Selector portlet to the default page
    6. Switch to Japanese locale

    Searching & Results

    a. Searched for the string English Japanese

    • SEARCH RESULT: PASS - Document3.txt is displayed
    • HIGHLIGHT: PASS - Both English and Japanese are highlighted as expected in Document3.txt.

    b. Searched for the string あいうえお 日本語 (aiueo nihongo)

    • SEARCH RESULT: PASS - Both Document1.txt and Document2.txt are available in the search results.
    • HIGHLIGHT: FAIL - Partially working. 日本語 was highlighted as expected, but あいえうお is NOT highlighted. Strangely, only あい is highlighted.

    c. Searching for partial strings such as あいう (aiu)

    • SEARCH RESULT: FAIL - No search results are present, even though it is expected that あいう is present in Document1.txt
    • HIGHLIGHT: FAIL - Since there are no search results, nothing can be highlighted.

    d. Search for サンプル (sampuru)

    • SEARCH RESULT: PASS - Document サンプルB is visible.
    • HIGHLIGHT: PASS - Only text that says サンプル is highlighted.

    e. Search for 推進 (suishin)

    • SEARCH RESULT: PASS - Document is visible
    • HIGHLIGHT: PASS - The string is highlighted in the DM result.

    f. Search for 推進部 (suishinbu)

    • SEARCH RESULT: PASS - Document is visible
    • HIGHLIGHT: FAIL - The string is highlighted in the DM result, but strangely enough, the last character of the string is highlighted also.

    g. Search for 品川区 (shinagawaku)

    • SEARCH RESULT: PASS - DM result is found.
    • HIGHLIGHT: FAIL - The document contains this string in the actual content, but because the description is the only thing that's displayed, no highlights are present.

    Reproduced on [email protected]
    Reproduced with Remote Elasticsearch 2.4.x
    Reproduced with Solr: https://dev.liferay.com/discover/deployment/-/knowledge_base/7-0/using-solr - Tested with Liferay Solr 5 Search Engine 1.0.0

    • It does not work either if you change the assigned analyzer to text_ja for fields content, description, subtitle, title in schema.xml and hit a reindex either.
      Most probably it affects other assets as well


