-
Type:
Bug
-
Status: Closed
-
Resolution: No Longer Reproducible
-
Affects Version/s: 6.2.5 CE GA6, 6.2.10 EE GA1, 6.2.X EE, 7.0.1 CE GA2, 7.0.0 DXP SP1, Master
-
Fix Version/s: Master
-
Component/s: Core Infrastructure, Environments, Environments > Databases
-
Labels:
-
Flagged:Impediment
-
Story Points:1
-
Fix Priority:5
-
OS:Mac OS X 10.9
-
JDK:Oracle Sun JDK 7
-
Application Servers:Apache Tomcat 7.0
-
Browsers:Safari 9
-
Databases:MySQL 5.7
Background
Besides having 3 different alphabets, Japanese language uses some other non-standard characters, specially for givennames and familynames. These characters are part of the UTF-8 but encoded in 4 bytes instead of the regular 2 or 3 bytes.
For Japanese users, it is pretty easy to encounter this situation as their users database grows, making this a serious issue.
Issue
Liferay does not properly show text encoded in utf8mb4 (4-byte UTF-8).
Steps to reproduce
- Prepare a new clean SP14 bundle
- Connect it to MySQL or Postgres (Make sure JDBC connector does not force UTF-8)
jdbc.default.driverClassName=com.mysql.jdbc.Driver jdbc.default.url=jdbc:mysql://localhost/portal-gen-utf8mb4?useFastDateParsing=false jdbc.default.username=liferay jdbc.default.password=password
- Make sure the database is created with encoding utf8mb4 in MySQL or utf8 in Postgres
- Start the bundle
- Create a new user an use set it's name or family name to a 4-byte UTF-8 character.
(See attached character.txt for reference) - Save the new user
- Page is reloaded
Actual results
That character is shown as ?? everywhere except in the textbox. It is saved correctly inside the database.
See attached utf8mb4.png screenshot
Expected results
Character is properly shown everywhere
Workaround
- Disable ETag and GZip filters by setting the following properties:
# # The ETag filter is used to generate ETag headers. # com.liferay.portal.servlet.filters.etag.ETagFilter=false # # If the user can unzip compressed HTTP content, the GZip filter will # zip up the HTTP content before sending it to the user. This will speed up # page rendering for users that are on dial up. # com.liferay.portal.servlet.filters.gzip.GZipFilter=false
- Restart bundle
- Note that character is displayed successfully
See attached utf8mb4_filtersdisabled.png screenshot
Some considerations
For the first note on whether the workaround would be something you could put into production. Disabling GZipFilter is part of the deployment checklist, because compression is better handled by the web server. The only thing ETagFilter does is attempt to save bandwidth – the portal still has to generate all the data for the response and then decide afterwards if it can send a 304, and therefore having a CDN equates to ETagFilter not providing much value.
Root issue
On the second note for an analysis on what part of the code might be causing it, without setting up the environment to know for sure, I would first suspect the following code blocks that wind up being called by GZipFilter and ETagFilter, respectively.
https://github.com/liferay/liferay-portal/blob/6.2.x/portal-impl/src/com/liferay/portal/servlet/filters/gzip/GZipResponse.java#L168-L169
https://github.com/liferay/liferay-portal/blob/6.2.x/portal-service/src/com/liferay/portal/kernel/servlet/RestrictedByteBufferCacheServletResponse.java#L115-L116
The code in both cases assumes a specific encoding is already set on the response, and it may not be, because AbsoluteRedirectsFilter only ensures a specific encoding is set on interpreting data on the request.
Some additional resources
http://japanese.stackexchange.com/questions/6872/are-the-4-byte-utf-8-kanji-rare-enough-that-i-can-ignore-them
http://stackoverflow.com/questions/13341918/encoding-issue4-byte-japanese-characters
LPP-20707
LPS-49045