Scheduled jobs can be re-triggered if there was a loss of connection between nodes in a cluster
Description
Attachments
1
causes
Activity
Show:

Shitian "Shelton" Zhang November 3, 2016 at 12:47 AM
PASSED Manual Testing using the following steps:
Set up a cluster environment.
Put the
in webapps/ROOT.
In the first node, access the jsp.
Assert the job is triggered in the first node.
Wait and check the time.
Shutdown the first node.
Assert the job is triggered in the second node.
Check the time.
Reproduced on:
Tomcat 8.0.32 + MySQL 5.6. Portal ee-7.0.x GIT ID: 9c2b1cf2e9988f282b15662fbb447f618801ab32.
The job gets fired immediately.
Fixed on:
Tomcat 8.0.32 + MySQL 5.6. Portal ee-7.0.x GIT ID: 6a1002bf5a0b1908908e6bb764d2a37ed1b3dc87.
The job gets fired until the expected time.
Fixed
Details
Assignee
Shitian "Shelton" ZhangShitian "Shelton" Zhang(Deactivated)Reporter
Mariano AlvaroMariano AlvaroBranch Version/s
7.0.x6.2.xBackported to Branch
CommittedFix Priority
3Git Pull Request
7.0 Fix Pack Version
2Story Points
1.5Components
Affects versions
Priority
Medium
Details
Details
Assignee

Reporter

Branch Version/s
7.0.x
6.2.x
Backported to Branch
Committed
Fix Priority
3
Git Pull Request
7.0 Fix Pack Version
2
Story Points
1.5
Components
Affects versions
Priority
Zendesk Support
Linked Tickets
Zendesk Support
Linked Tickets
Zendesk Support

Linked Tickets
Created June 28, 2016 at 12:06 AM
Updated June 26, 2023 at 12:06 AM
Resolved August 19, 2016 at 9:59 PM
Steps to Reproduce:
In a clustered environment startup first node (for now master node).
Startup second node (salve node).
Deploy in both nodes a simple scheduled job test portlet set to trigger every 10 minutes.
Activate log level to DEBUG for com.liferay.portal.scheduler.ClusterSchedulerEngine to be able to detect which is the node currently executing jobs.
Wait 10 minutes to check that the job gets correctly fired in the master node (optional, this step can be skipped, it's only to check that configuration is fine).
Before job's next execution simulate a loss of connection between both nodes.
Check, in the logs, that slave node is now also master and is going to execute jobs.
Before job's next execution enable again connection between both nodes.
After a while one of both nodes will be established as slave and will indicate in the logs that is no longer going to execute jobs.
Wait until master executes the job (for first or second time depending whether you skipped the optional step or not).
Before the job is executed another time simulate a loss of connection between both nodes.
Expected Results:
No job should be executed at this point until the expected time.
Actual Results:
Jobs get immediately fired in one of the nodes (the one which was a slave before breaking connection for a second time) independently of the expected time.