Details
-
Bug
-
Resolution: Done
-
Major
-
5.1.0.CR1
-
None
Description
When the CacheViewsManager running on the merge coordinator receives the merged view, it immediately sends a StateTransferControlCommand
{type=RECOVER_VIEW}to all the other members of the cluster do discover their installed views.
Sometimes the RECOVER_VIEW command reaches a node from the other partition before the merged view is installed on that node, and the message is dropped. The coordinator will eventually retry sending the message and succeed the second time, but the retry time can be up to 50 seconds with the default JGroups configuration.
I've discussed with Bela several possible workarounds:
1. Wait a short amount of time before sending the RECOVER_VIEW command.
2. Send the RECOVER_VIEW command as unicasts.
3. Use the RSVP flag for the RECOVER_VIEW command (once we upgrade to JGroups 3.1).