Uploaded image for project: 'PUBLIC - Liferay Portal Community Edition'
  1. PUBLIC - Liferay Portal Community Edition
  2. LPS-122358

Web Content articles with a large text are not indexed and an Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field="ddmFieldArray.ddmFieldValueText_en_US_String_sortable" is thrown

Details

    Description

      Important note: It is necessary to execute a Web Content reindex after patching your installation

      This error is written in the log file every time a web content is not correctly indexed, so after applying the fix in your installation it is necessary to execute a Web Content reindex from Control Panel in order to index the missing contents


      In case you create a webcontent with a large text (more than 32766 characters) a ElasticsearchException is thrown in the Elasticsearch server and displayed in the Liferay Log:

      2020-10-20 06:43:29.164 ERROR [liferay/search_writer/SYSTEM_ENGINE-4][BulkDocumentRequestExecutorImpl:63] failure in bulk execution:_[1]: index [liferay-20097], type [LiferayDocumentType], id [com.liferay.journal.model.JournalArticle_PORTLET_670212], message [ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field="ddmFieldArray.ddmFieldValueText_en_US_String_sortable" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[60, 112, 62, 108, 111, 114, 101, 109, 32, 105, 112, 115, 117, 109, 32, 100, 111, 108, 111, 114, 32, 115, 105, 116, 32, 97, 109, 101, 116, 44]...', original message: bytes can be at most 32766 in length; got 40434]]; nested: ElasticsearchException[Elasticsearch exception [type=max_bytes_length_exceeded_exception, reason=bytes can be at most 32766 in length; got 40434]];] [Sanitized]

      and the webcontent is not correctly indexed.

      This regression is caused by LPS-119088, because it changed the way "sortable" field is indexed in DDMIndexer, see:
      https://github.com/brianchandotcom/liferay-portal/pull/93505/files#diff-c98f2b20dea78a52e65c4ea58c98aa96e4d179f73fea2cbf023566fb2578c70fL490-L495
      (note: by mistake, the commits of LPS-119088 are commited with the wrong LPS number LPS-119008)

      Steps to reproduce

      1. Create a custom structure with an HTML field
      2. Create a web content using the custom structure
      3. Fill HTML Box field with a text with more than 32766 characters (you can use attached LoremIpsum_40000_chars.txt)
      4. Check log file
        • Expected behavior: No error is thrown in log file
        • Wrong behavior: An error trace with the ElasticsearchException "Document contains at least one immense term in field" is displayed in the log file
      5. Add an Asset Publisher with a dynamic configuration and display the custom structure
        • Expected behavior: Created webcontent is displayed
        • Wrong behavior: Created webcontent is not displayed and an exception is thrown to the log file

      Attachments

        Issue Links

          Activity

            People

              dereck.portela Dereck Portela
              jorge.diaz Jorge Diaz
              Jorge Diaz Jorge Diaz
              Jorge Diaz Jorge Diaz
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:
                2 years, 30 weeks, 3 days ago

                Packages

                  Version Package
                  7.3.10 DXP FP1
                  7.3.10.1 DXP SP1
                  7.3.6 CE GA7
                  7.3.X
                  7.4.13 DXP GA1
                  Master