Resolution: No Longer Reproducible
Affects Version/s: 6.2.5 CE GA6, 6.2.10 EE GA1, 6.2.X EE, 7.0.1 CE GA2, 7.0.0 DXP SP1, Master
Fix Version/s: Master
OS:Mac OS X 10.9
JDK:Oracle Sun JDK 7
Application Servers:Apache Tomcat 7.0
Besides having 3 different alphabets, Japanese language uses some other non-standard characters, specially for givennames and familynames. These characters are part of the UTF-8 but encoded in 4 bytes instead of the regular 2 or 3 bytes.
For Japanese users, it is pretty easy to encounter this situation as their users database grows, making this a serious issue.
Liferay does not properly show text encoded in utf8mb4 (4-byte UTF-8).
Steps to reproduce
- Prepare a new clean SP14 bundle
- Connect it to MySQL or Postgres (Make sure JDBC connector does not force UTF-8)
- Make sure the database is created with encoding utf8mb4 in MySQL or utf8 in Postgres
- Start the bundle
- Create a new user an use set it's name or family name to a 4-byte UTF-8 character.
(See attached character.txt for reference)
- Save the new user
- Page is reloaded
That character is shown as ?? everywhere except in the textbox. It is saved correctly inside the database.
See attached utf8mb4.png screenshot
Character is properly shown everywhere
- Disable ETag and GZip filters by setting the following properties:
- Restart bundle
- Note that character is displayed successfully
See attached utf8mb4_filtersdisabled.png screenshot
For the first note on whether the workaround would be something you could put into production. Disabling GZipFilter is part of the deployment checklist, because compression is better handled by the web server. The only thing ETagFilter does is attempt to save bandwidth – the portal still has to generate all the data for the response and then decide afterwards if it can send a 304, and therefore having a CDN equates to ETagFilter not providing much value.
On the second note for an analysis on what part of the code might be causing it, without setting up the environment to know for sure, I would first suspect the following code blocks that wind up being called by GZipFilter and ETagFilter, respectively.
The code in both cases assumes a specific encoding is already set on the response, and it may not be, because AbsoluteRedirectsFilter only ensures a specific encoding is set on interpreting data on the request.
Some additional resources