-
Bug
-
Resolution: Obsolete
-
Major
-
None
-
8.0.0.Alpha2, 7.2.3.Final
-
None
The AsyncCacheWriter modification queue is not sent with state transfer when the store is shared. A joiner can then read from the shared store a stale version of entries that have updates in the modification queue but are no longer in memory (because they were either removed explicitly, or evicted).
One possible fix would be to delay the end of state transfer until the modifications that have not reached the joiner (i.e. with an old topology id) are actually written to the shared store.
AsyncCacheWriter could have a new method CompletableFuture<Object> getSyncFuture(), and when the returned future completes, the current state of AsyncCacheWriter is guaranteed to be written to the store OR overwritten by a newer update. The state provider would call this method when starting an outbound transfer, and wait on the future before sending the last state chunk.
To avoid concurrency issues, we may also have to hold StateTransferLock's exclusive topology lock while calling getSyncFuture(), and its shared topology lock while calling AsyncCacheWriter.store().