Details

    • Flagged:
      Impediment
    • Story Points:
      1
    • Fix Priority:
      5
    • OS:
      Mac OS X 10.9
    • JDK:
      Oracle Sun JDK 7
    • Application Servers:
      Apache Tomcat 7.0
    • Browsers:
      Safari 9
    • Databases:
      MySQL 5.7

      Description

      Background
      Besides having 3 different alphabets, Japanese language uses some other non-standard characters, specially for givennames and familynames. These characters are part of the UTF-8 but encoded in 4 bytes instead of the regular 2 or 3 bytes.
      For Japanese users, it is pretty easy to encounter this situation as their users database grows, making this a serious issue.

      Issue
      Liferay does not properly show text encoded in utf8mb4 (4-byte UTF-8).

      Steps to reproduce

      1. Prepare a new clean SP14 bundle
      2. Connect it to MySQL or Postgres (Make sure JDBC connector does not force UTF-8)
        jdbc.default.driverClassName=com.mysql.jdbc.Driver
        jdbc.default.url=jdbc:mysql://localhost/portal-gen-utf8mb4?useFastDateParsing=false
        jdbc.default.username=liferay
        jdbc.default.password=password
        
      3. Make sure the database is created with encoding utf8mb4 in MySQL or utf8 in Postgres
      4. Start the bundle
      5. Create a new user an use set it's name or family name to a 4-byte UTF-8 character.
        (See attached character.txt for reference)
      6. Save the new user
      7. Page is reloaded

      Actual results
      That character is shown as ?? everywhere except in the textbox. It is saved correctly inside the database.
      See attached utf8mb4.png screenshot

      Expected results
      Character is properly shown everywhere

      Workaround

      1. Disable ETag and GZip filters by setting the following properties:
        #
        # The ETag filter is used to generate ETag headers.
        #
        com.liferay.portal.servlet.filters.etag.ETagFilter=false
        
        #
        # If the user can unzip compressed HTTP content, the GZip filter will
        # zip up the HTTP content before sending it to the user. This will speed up
        # page rendering for users that are on dial up.
        #
        com.liferay.portal.servlet.filters.gzip.GZipFilter=false
        
      2. Restart bundle
      3. Note that character is displayed successfully
        See attached utf8mb4_filtersdisabled.png screenshot

      Some considerations
      For the first note on whether the workaround would be something you could put into production. Disabling GZipFilter is part of the deployment checklist, because compression is better handled by the web server. The only thing ETagFilter does is attempt to save bandwidth – the portal still has to generate all the data for the response and then decide afterwards if it can send a 304, and therefore having a CDN equates to ETagFilter not providing much value.

      Root issue
      On the second note for an analysis on what part of the code might be causing it, without setting up the environment to know for sure, I would first suspect the following code blocks that wind up being called by GZipFilter and ETagFilter, respectively.

      https://github.com/liferay/liferay-portal/blob/6.2.x/portal-impl/src/com/liferay/portal/servlet/filters/gzip/GZipResponse.java#L168-L169
      https://github.com/liferay/liferay-portal/blob/6.2.x/portal-service/src/com/liferay/portal/kernel/servlet/RestrictedByteBufferCacheServletResponse.java#L115-L116
      The code in both cases assumes a specific encoding is already set on the response, and it may not be, because AbsoluteRedirectsFilter only ensures a specific encoding is set on interpreting data on the request.

      Some additional resources
      http://japanese.stackexchange.com/questions/6872/are-the-4-byte-utf-8-kanji-rare-enough-that-i-can-ignore-them
      http://stackoverflow.com/questions/13341918/encoding-issue4-byte-japanese-characters
      LPP-20707
      LPS-49045

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Days since last comment:
                  2 years, 31 weeks, 3 days ago

                  Packages

                  Version Package
                  Master