Type: Feature Request
Affects Version/s: 6.1.20 EE GA2, 6.2.0 CE M5
Fix Version/s: None
Component/s: Documents & Media
Environment:OS X, Linux
Enterprise Requirement:Enterprise Requirement
Some of our client's documents that get uploaded can run into the several thousand pages length. Since upgrading to 6.1.20 EE GA2 and enabling the document library preview functionality we have noticed a severe impact to the performance of our production instance.
While we have attempted to utilize the ImageMagick resource limit constraint properties (very useful to have BTW - thx!) available in the Control Panel - it appears that ImageMagick itself is not the problem. Ghostscript is.
We are running on an m1.large EC2 Amazon Linux AMI instance. This has 2 CPUs, 7.5GB of RAM and 8GB of disk space. We have allocated 4GB of RAM to our LR instance. I should also mention that we migrated our Document Library over to the S3Store implementation as part of our upgrade release. We also have OpenOffice installed and enabled to facilitate document conversions.
After our initial upgrade release last week we had several server crashes which we now believe to be a result of the system being overloaded by attempting to play "catch up" on the document image preview backlog as clients began to navigate around the upgraded system - triggering all of these previews.
While this has now somewhat subsided - we are still seeing large files having a substantial impact to our system.
We ended up creating a separate dedicated mounted drive for ImageMagick/Ghostscript file management - 60GB in size. Without this we saw our primary drive consistently getting filled up / drained along with image preview generation. We have attempted to tweak each of the ImageMagick resource limit options. But when faced with one of these large files - for example a very large spreadsheet that is converted to a PDF via OpenOffice - the only constraint that appears to cause the whole process to finally end is when the separate filesystem is full. Doesn't matter if you limit ImageMagick to only use 1GB of disk drive space. GhostScript is its own process and the parameters provided to it from ImageMagick are not tweakable on our side.
Call to convert:
Call to GhostScript:
Whenever one of these kicks off - our available RAM on the system ends up shrinking down to ~50MB. Our load average spikes as CPU usage increases dramatically.
From a memory usage perspective - I profiled this on a test server (using pmap) and noted that a large amount of memory is being assigned to the forked java process that initiates the preview process: ~2GB
From a CPU perspective - the vast majority of CPU load increase is being generated by that GhostScript process.
I then tried changing the dl.file.entry.preview.fork.process.enabled property to false to see if that had any impact. It did not. The same amount of RAM ended up getting used. However I was not able to trace it to a specific source when using pmap.
We are also having to restart OpenOffice several times a day since it is crashing on us now. This was not a problem in the previous environment.
The preview functionality for the Document Library is a really cool new feature. We love it and our clients have already begun to express interest and appreciation for it. So we want to keep it enabled. The main reason for filing this ticket is to make you aware that all is not smooth sailing with this functionality and to suggest that additional resource constraints may be necessary to ensure stability on a production system.
- Pre-processor that evaluates incoming files destined for preview generation suitability and then blacklists those that exceed some kind of threshold – to avoid them constantly thrashing the system
- Ghostscript resource limits configuration similar to those added to the Control Panel for ImageMagick
- Separate preview generation process that can be set to a schedule and run from a separate server still pointing at the same DB – thereby offsetting any impact to the main LR instance.
- Install fresh 6.1.20 EE GA2 instance
- Install OpenOffice
- Install ImageMagick and GhostScript
- Enable OpenOffice within LR Control Panel
- Enable ImageMagick and Ghostscript within LR Control Panel
- Create a Site
- Manage Content for that Site and Add a new document - ideally a PDF file that has several thousand pages (I'm sorry that I can't provide an example PDF file at this time - the information that is contained in our example files is confidential)
- Once the file has uploaded - navigate to it to guarantee the generation of a preview
- Watch as your system takes a full frontal assault as the forked java process / convert and ghostscript take on this preview generation