Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Obsolete
Priority: Major
Fix Version/s: None
Affects Version/s: 9.2.1.Final, 9.3.0.Alpha1
Component/s: Core
Labels:
None

Description

This is related to ~~ISPN-9104~~, but it applies to any commands in a REPL_SYNC cache.

We have a topology id check to avoid running commands from an older topology, but if the cluster splits cleanly in 2, then both partitions rebalance and install a topology with the same id. After the partitions merge, commands that were broadcast in one partition are retransmitted by NAKACK2 to the nodes in the other partition, and they will have the right topology id (until the post-merge cache topology update is received) so they will be executed.

The worst scenario is in a transactional cache, where you could have node A in partition [AB] broadcast a lock acquisition command (LockControlCommand in a pessimistic cache, or PrepareCommand in an optimistic cache), wait for the responses, and then broadcast a lock release command (1-phase PrepareCommand, CommitCommand, or TxCompletionNotificationCommand). In partition [AB], the TxCompletionNotificationCommand is only sent after all the nodes confirmed that they acquired the lock. When partitions [AB] and [CD] merge, C and D receive both commands, but there's no guarantee that they will be processed in the right order. If the lock release command runs first, it won't do anything, then the lock acquisition command will acquire the lock, and no other command is going to release it.

Attachments

Issue Links

is related to

ISPN-9104 Majority partition nodes can process minority topology updates after merge

Closed

relates to

ISPN-10391 Possible loss of (pessimistic) lock if the lock owner is expelled from the cluster and merged later

To Do

Activity

People

Assignee:: Unassigned

Reporter:: Dan Berindei (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2018/04/27 2:09 AM

Updated:: 2023/05/25 1:37 PM

Resolved:: 2023/05/25 1:37 PM