-
Bug
-
Resolution: Done
-
Major
-
None
-
None
A node will remain in the cluster topology even if it never enters the RUNNING state.
1. CacheDelegate.start
2. ComponentRegistry.start
3. AbstractComponentRegistry.start
4. AbstractComponentRegistry.internalStart
5. AbstractComponentRegistry.handleLifecycleTransitionFailure
The last start method will execute the @Start methods of the components. In the event that one of the methods throws an exception, the node will enter the FAILED state.
The problem is that in distributed mode the node is added to the cluster topology before the rehashing takes place. If an exception is thrown during the rehash, the join still completes successfully.
1. Broadcast new consistent hash.
2. Get state.
3. Invalidate state. (This is in a finally block. Occurs even if get state fails.)
4. Complete join. (This is in a finally block. Occurs even if get state/invalidation fail.)
There needs to be a way to remove a node from the topology if it enters the FAILED state. Or, perhaps wait to add it to the topology until it enters the RUNNING state.