Details
-
Feature Request
-
Resolution: Done
-
Major
-
None
-
None
-
0
-
0%
Description
When we use TCPPING.initial_hosts=A, and have nodes A, B and C, then the following can happen:
- A is started. The view is {A}
- B is started, finds A and joins the cluster. The view is {A,B}
- C is started, finds A and joins the cluster. The view is now {A,B,C}
- A is killed. The view is now {B,C}
- C is restarted
--> C won't find A therefore cannot ask B to join it (C doesn't know about B)
SOLUTION:
- If every member persisted its discovery results on disk, C would still know about A and B, and could therefore discover B
- This could be implemented directly in TCPPING, or as a generic, separate, protocol (possibly useful for other discovery protocols such as S3_PING, TCPGOSSIP (when the GossipRouter is down) as well)
- The disk cache would have to include an expiry mechanism, so that the file doesn't grow forever, and also stale results aren't returned forever
- This mechanism would not work though if C came up the first time (no file on disk yet) and A wasn't running.