TikaRawMetadataProcessor uses contenthandler with DummyWriter that discards anything written to it. This could be replaced with WriteOutContentHandler with writeLimit 0. Using this instead of DummyWriter makes the metadata extraction 10x faster in my tests. Using the current approach also generates OutOfMemoryErrors for large PDF documents.
I'll attach a patch of the proposed enhancement