Affects Version/s: 7.0.0 DXP SP8, 7.0.0 DXP FP56, 7.0.X, 7.1.0 CE GA1, 7.1.X, Master
Component/s: Search Infrastructure
Backported to Branch:Committed
Sprint:Search | S02 Sprint 3
Git Pull Request:
After creating a few Japanese language tags, and assigning those tags to Collaboration suite assets, search does not work properly.
Based on the behavior we have observed, it seems that tags are stored in a default index field which uses Standard analyzer which breaks down all Japanese Kanji characters into single characters.
The following example lists steps to reproduce in DM, but it happens in WCM as well.
LPS-84666 involved Categories as well, but was closed as a Duplicate as it seems that this behavior can be handled on this ticket also. Please examine the issue in Categories as well.
Steps to Reproduce
- Unzip vanilla Liferay bundle (clean, no portal-setup.wizard, etc.)
- Initialize Liferay with Japanese as default language
- Restart Server
- After restarting, sign in with email@example.com (you can use English for ease of use, i.e. append /en/ after hostname:port)
- Navigate to Product Menu > Liferay DXP (Site) > Categorization > Tags
- Click Add Tag and add 東京
- Click Add Tag and add 出前京丁
- Navigate to Product Menu > Liferay DXP (Site) > Content > Documents and Media
- Upload a file and assign it the first tag 東京
- Upload another file and assign it with the second tag 出前京丁
- Change Display setting to Japanese (i.e. append /ja/ after hostname:port)
- Return to the home page, and in the Search widget (top right corner), search for 東京
Even though we searched for 東京, we get the asset assigned 東京 and 出前京丁.
This is problematic because the only commonality that these two tags have are the character 京.
Since we searched for 東京, we get results for 東京 only.
7.1.x private Commit: 815320372a34faa0ccd0ed1d4989af7d1502c5e6