Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-9104

Majority partition nodes can process minority topology updates after merge

    XMLWordPrintable

Details

    Description

      After a merge, NAKACK2 resends some broadcast messages that were originally sent in a partition to the members of the merged cluster that weren't in that partition.

      We have a check in LocalTopologyManagerImpl to ignore topology updates from the wrong coordinator, but unfortunately that only happens after calling resetLocalTopologyBeforeRebalance(). If the topology id is higher than the current topology id, that can install a "reset" topology to prepare for rebalance.

      The reset topology has all the owners owned by the minority partition nodes, so the majority partition nodes installing this topology will invalidate all their entries before conflict resolution even starts.

      This causes random failures in the conflict resolution tests, e.g. MergePolicyPreferredAlwaysTest:

      19:19:28,448 DEBUG (testng-Test:[]) [GMS] Test-NodeA-50368: installing view MergeView::[Test-NodeA-50368|10] (5) [Test-NodeA-50368, Test-NodeB-27290, Test-NodeC-9368, Test-NodeD-49504, Test-NodeE-55304], 2 subgroups: [Test-NodeA-50368|8] (3) [Test-NodeA-50368, Test-NodeB-27290, Test-NodeC-9368], [Test-NodeD-49504|9] (2) [Test-NodeD-49504, Test-NodeE-55304]
      19:19:28,740 TRACE (jgroups-10,Test-NodeA-50368:[]) [GlobalInboundInvocationHandler] Attempting to execute non-CacheRpcCommand: CacheTopologyControlCommand{cache=___defaultcache, type=CH_UPDATE, sender=Test-NodeD-49504, joinInfo=null, topologyId=21, rebalanceId=7, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 132, Test-NodeE-55304: 124]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 134, Test-NodeE-55304: 122]}, availabilityMode=null, phase=READ_ALL_WRITE_ALL, actualMembers=[Test-NodeD-49504, Test-NodeE-55304], throwable=null, viewId=7} [sender=Test-NodeD-49504]
      19:19:28,741 DEBUG (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [LocalTopologyManagerImpl] Installing fake cache topology CacheTopology{id=20, phase=NO_REBALANCE, rebalanceId=6, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 132, Test-NodeE-55304: 124]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD-49504, Test-NodeE-55304], persistentUUIDs=[6f22a4be-bf94-42a7-9ea1-4128944351a2, 59c315d5-c7d2-4121-b939-01d62ba9af4f]} for cache ___defaultcache
      19:19:28,742 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] On cache ___defaultcache we have: new segments: []; old segments: RangeSet(256)
      19:19:28,744 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] On cache ___defaultcache we have: added segments: {}; removed segments: {0-255}
      19:19:28,745 DEBUG (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] Removing no longer owned entries for cache ___defaultcache
      19:19:28,745 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [InvocationContextInterceptor] Invoked with command InvalidateCommand{keys=[MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}]} and InvocationContext [org.infinispan.context.impl.NonTxInvocationContext@48a2a9d5]
      19:19:30,152 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [JGroupsTransport] Test-NodeA-50368 sending request 232 to all: SingleRpcCommand{cacheName='___defaultcache', command=RemoveCommand{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=null, metadata=null, flags=[SKIP_REMOTE_LOOKUP, PUT_FOR_STATE_TRANSFER, IGNORE_RETURN_VALUES], commandInvocationId=CommandInvocation:Test-NodeA-50368:109014, valueMatcher=MATCH_ALWAYS, topologyId=24}}
      19:19:28,748 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [DefaultDataContainer] Removed ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=DURING SPLIT} from container
      
      19:19:30,096 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache conflict detected {Test-NodeA-50368=NullCacheEntry{}, Test-NodeE-55304=ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, Test-NodeC-9368=NullCacheEntry{}, Test-NodeD-49504=ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, Test-NodeB-27290=NullCacheEntry{}}
      19:19:30,132 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache applying EntryMergePolicy org.infinispan.conflict.MergePolicy to PreferredEntry NullCacheEntry{}, otherEntries [ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, NullCacheEntry{}, ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, NullCacheEntry{}]
      19:19:30,132 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache executing remove on conflict: key MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}
      
      19:19:35,274 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.conflict.impl.MergePolicyPreferredAlwaysTest.testPartitionMergePolicy[REPL_SYNC, 5N]
      java.lang.AssertionError: Key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}. VersionMap: {Test-NodeA-50368=null, Test-NodeE-55304=null, Test-NodeC-9368=null, Test-NodeD-49504=null, Test-NodeB-27290=null}
      	at org.testng.AssertJUnit.fail(AssertJUnit.java:59) ~[testng-6.9.9.jar:?]
      	at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24) ~[testng-6.9.9.jar:?]
      	at org.testng.AssertJUnit.assertNotNull(AssertJUnit.java:267) ~[testng-6.9.9.jar:?]
      	at org.infinispan.conflict.impl.BaseMergePolicyTest.afterConflictResolutionAndMerge(BaseMergePolicyTest.java:113) ~[test-classes/:?]
      	at org.infinispan.conflict.impl.BaseMergePolicyTest.testPartitionMergePolicy(BaseMergePolicyTest.java:138) ~[test-classes/:?]
      

      Attachments

        Issue Links

          Activity

            People

              dberinde@redhat.com Dan Berindei (Inactive)
              dberinde@redhat.com Dan Berindei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: