Affects Version/s: 7.0.X, Master
A specific combination of incoming poller/receive and poller/send request may cause thread deadlock in the poller infrastructure. Situation is the same both for master and 7.0.x
There is a deadlock window for threads running PollerServlet, that spans through two instants in time:
- When thread acquires the lock for SynchronousPollerChannelListener.getNotificationEvents(). The call happens here
- When thread acquires the lock for ChannelImpl.getNotificationEvents(). This happens before releasing the previous lock, here.
If, for some reason, the request for poller/receive takes some time to be handled, and the first lock is acquired but not the second one, a poller/send request arriving the server for the same companyId/userId pair can acquire the second lock while registering the new SyncronousChannelListener.
In that case, we're lost because this second thread can never release the ChannelImpl lock: it will execute the notifyChannelListeners() before releasing it, and that's where it tries to call a syncrhonized method in the SyncrhonousChannelListener which was registered by the poller/receive trhead. That method shares the monitor with the one that was first locked by the the poller/receive thread. In addition, the first thread will eventually try to acquire the second lock (the one in ChannelImpl object), but that one was already acquired by the second thread. We have a pretty good deadlock.
Given that we have a wait(timeout) call between the two lock acquisitions for the poller/receive thread, we are widening the window a lot. Even if one thinks that inside wait(), thread releases the lock over the syncrhonized method, that does not impede the other thread to acquire the second lock that will try to be acquired by the first thread once wait() exits, it just gives a chance to the second thread to call other sync methods in SynchronousChannelListener to notify the waiting thread. But if that does not happen, wait will exit and deadlock occurs
- Send and receive timers are configurable and self-adjusting in the frontend logic
- There can be a front/load balancer which processes incoming requests, introducing extra delays
- The same account can be used to login from different computers
Then, the separation between a poller/receive and a poller/send request for the same user is essentially unpredictable.
As a result, even if we reproduced this situation artificially (i.e. using breakpoints to introduce delays) the deadlock window represents a real risk in production systems, so it's something we should fix.