Type: Regression Bug
Affects Version/s: 6.1.X EE, 6.2.X EE, 7.0.0 M5
Steps to reproduce:
1. Install Liferay 6.1.30 EE GA3 with the latest patches (currently portal-35-6130.zip)
2. Create a portal-ext.properties file with this line added:
index.search.writer.max.queue.size = 1
3. Start the portal
4. Deploy one of the attached portlets:
- OR you can use the portlet: create300users.war that needs to be deployed as a portlet and it can be found under the "sample" portlet category. You can just add the portlet to the main page and click on the ActionURL to start it.
300 users should be created without any errors
User creation starts, but an error occurs:
Reproduced with the customer's patches (since liferay-hotfix-1505-6130.zip)
Reproduced with the latest patches for 6.1 EE GA3
Reproduced on 6.1.x (ca05ec51cf697019c59445c396bf5bac58364164)
Reproduced on 6.2.x (392e4d66a78316ca4dd1935192e68b9596aae90f)
Could not reproduce on Master, but the related code didn't change, so if I could force a CallerRunsPolicy to run, the same thing should happen.
"AbstractSearchEngineConfigurator registers a RejectedExecutionHandler to the search writer destination when the property index.search.writer.max.queue.size is set (which is the case since liferay-hotfix-1505-6130) so that, when the queue is full, indexing is done on the current thread instead of an executor thread in background.
Unfortunately, the Runnables that ParallelDestination produces are made for running on a exceutor thread and call CentralizedThreadLocal.clearShortLivedThreadLocals(); in a finally block. This deletes the callback lists TransactionCommitCallbackUtil manages.
On the next transaction boundary (when TransactionCommitCallbackUtil.popCallbackList() is called) the transaction handling explodes:
The solution I could come up with is not to mace TransactionCallbackThreadLocal an AutoResetThreadLocal, so it won't be cleared by the CentralizedThreadLocal.clearShortLivedThreadLocals() call.
As a workaround of this LPS, it is possible to increment the max queue size to avoid the execution of the problematic code:
It is also good to avoid the creation of threads batching lucene writes, this makes disk writes less often: