Uploaded image for project: 'PUBLIC - Liferay Portal Community Edition'
  1. PUBLIC - Liferay Portal Community Edition
  2. LPS-27507

TikaRawMetadataProcessor inefficient in extracting metadata

    Details

      Description

      TikaRawMetadataProcessor uses contenthandler with DummyWriter that discards anything written to it. This could be replaced with WriteOutContentHandler with writeLimit 0. Using this instead of DummyWriter makes the metadata extraction 10x faster in my tests. Using the current approach also generates OutOfMemoryErrors for large PDF documents.

      I'll attach a patch of the proposed enhancement

        Attachments

          Activity

            People

            • Assignee:
              support-lep@liferay.com SE Support
              Reporter:
              samuli Samuli Saarinen
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Packages

                Version Package