Loading...

XML

Word

Printable

When we use TCPPING.initial_hosts=A, and have nodes A, B and C, then the following can happen:

A is started. The view is {A}
B is started, finds A and joins the cluster. The view is {A,B}
C is started, finds A and joins the cluster. The view is now {A,B,C}
A is killed. The view is now {B,C}
C is restarted
--> C won't find A therefore cannot ask B to join it (C doesn't know about B)

SOLUTION:

If every member persisted its discovery results on disk, C would still know about A and B, and could therefore discover B
This could be implemented directly in TCPPING, or as a generic, separate, protocol (possibly useful for other discovery protocols such as S3_PING, TCPGOSSIP (when the GossipRouter is down) as well)
The disk cache would have to include an expiry mechanism, so that the file doesn't grow forever, and also stale results aren't returned forever
This mechanism would not work though if C came up the first time (no file on disk yet) and A wasn't running.