Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-9032

OperationsDuringMergeConflictTest.testPartitionMergePolicy random failures

    XMLWordPrintable

Details

    Description

      The test modifies a key in both partitions and checks that each partition still sees its own value for a while after receiving the merged cluster view.

      It installs a BlockStateResponseCommandHandler to block the conflict resolution responses and to keep the pre-merge values in the non-preferred topology owners. However, it does not block the installation of the CONFLICT_RESOLUTION cache topology. And during conflict resolution the read CH is the preferred topology's read CH, so non-preferred topology owners will ask the primary owner for the value:

      17:12:07,088 INFO  (testng-Test:[]) [JGroupsTransport] ISPN000093: Received new, MERGED cluster view for channel ISPN: MergeView::[Test-NodeI-48950|10] (4) [Test-NodeI-48950, Test-NodeJ-6577, Test-NodeK-2872, Test-NodeL-23686], 2 subgroups: [Test-NodeI-48950|8] (2) [Test-NodeI-48950, Test-NodeJ-6577], [Test-NodeK-2872|9] (2) [Test-NodeK-2872, Test-NodeL-23686]
      17:12:07,134 TRACE (stateTransferExecutor-thread-Test-NodeI-p50536-t2:[Merge-10]) [DefaultConflictManager] Cache ___defaultcache attempting to receive all replicas for segment 0 with topology CacheTopology{id=20, rebalanceId=7, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeK-2872: 134+122, Test-NodeL-23686: 122+134]}, pendingCH=DefaultConsistentHash{ns=256, owners = (4)[Test-NodeK-2872: 134+122, Test-NodeL-23686: 122+134, Test-NodeI-48950: 0+256, Test-NodeJ-6577: 0+256]}, unionCH=DefaultConsistentHash{ns=256, owners = (4)[Test-NodeK-2872: 134+122, Test-NodeL-23686: 122+134, Test-NodeI-48950: 0+256, Test-NodeJ-6577: 0+256]}, phase=CONFLICT_RESOLUTION, actualMembers=[Test-NodeK-2872, Test-NodeL-23686, Test-NodeI-48950, Test-NodeJ-6577], persistentUUIDs=[40b58191-37f1-4fa4-8e4b-f1a7c17bfa59, 208d742d-ecc9-4792-9dd5-f74d5666f5a2, d8914096-a5a4-4e1e-95a0-27c50fee9999, 7464e58b-5e23-4c1b-909e-4e53055f12ca]}
      17:12:07,212 TRACE (testng-Test:[]) [BaseDistributionInterceptor] Perform remote get for key MagicKey{13A4/DDFB0E77/56@Test-NodeI-48950}. currentTopologyId=20, owners=[Test-NodeK-2872, Test-NodeL-23686]
      17:12:07,212 TRACE (testng-Test:[]) [RpcManagerImpl] Test-NodeI-48950 invoking ClusteredGetCommand{key=MagicKey{13A4/DDFB0E77/56@Test-NodeI-48950}, flags=[]} to recipient list [Test-NodeK-2872, Test-NodeL-23686] with options RpcOptions{timeout=15000, unit=MILLISECONDS, deliverOrder=NONE, responseFilter=org.infinispan.interceptors.impl.BaseRpcInterceptor$1@241e783a, responseMode=WAIT_FOR_VALID_RESPONSE}
      17:12:07,225 TRACE (jgroups-4,Test-NodeI-48950:[]) [CommandAwareRpcDispatcher] Got acceptable response: Responses{
      Test-NodeK-2872: value=SuccessfulResponse{responseValue=ImmortalCacheValue {value=B}} , received=true, suspected=false
      17:12:07,226 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.conflict.impl.OperationsDuringMergeConflictTest.testPartitionMergePolicy
      java.lang.AssertionError: Key=MagicKey{13A4/DDFB0E77/56@Test-NodeI-48950}, Value=A, Cache Index=0, Topology=CacheTopology{id=20, rebalanceId=7, currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (2)[Test-NodeK-2872: 134+122, Test-NodeL-23686: 122+134]}, pendingCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (4)[Test-NodeK-2872: 134+122, Test-NodeL-23686: 122+134, Test-NodeI-48950: 0+256, Test-NodeJ-6577: 0+256]}, unionCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (4)[Test-NodeK-2872: 134+122, Test-NodeL-23686: 122+134, Test-NodeI-48950: 0+256, Test-NodeJ-6577: 0+256]}, phase=CONFLICT_RESOLUTION, actualMembers=[Test-NodeK-2872, Test-NodeL-23686, Test-NodeI-48950, Test-NodeJ-6577], persistentUUIDs=[40b58191-37f1-4fa4-8e4b-f1a7c17bfa59, 208d742d-ecc9-4792-9dd5-f74d5666f5a2, d8914096-a5a4-4e1e-95a0-27c50fee9999, 7464e58b-5e23-4c1b-909e-4e53055f12ca]} expected:<A> but was:<B>
      	at org.testng.AssertJUnit.fail(AssertJUnit.java:59) ~[testng-6.8.8.jar:?]
      	at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364) ~[testng-6.8.8.jar:?]
      	at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80) ~[testng-6.8.8.jar:?]
      	at org.infinispan.conflict.impl.BaseMergePolicyTest.assertCacheGet(BaseMergePolicyTest.java:160) ~[test-classes/:?]
      	at org.infinispan.conflict.impl.OperationsDuringMergeConflictTest.performMerge(OperationsDuringMergeConflictTest.java:121) ~[test-classes/:?]
      	at org.infinispan.conflict.impl.BaseMergePolicyTest.testPartitionMergePolicy(BaseMergePolicyTest.java:116) ~[test-classes/:?]
      

      https://ci.infinispan.org/job/Infinispan/job/master/543/testReport/org.infinispan.conflict.impl/OperationsDuringMergeConflictTest/history/

      Maybe we should use the union CH for reading during conflict resolution instead?

      Speaking of which, PreferAvailabilityStrategy sets the unionCH field in the conflict resolution CacheTopology, but it is ignored because CacheTopologyControlCommand doesn't have a field for it. We should set unionCH=null in PreferAvailabilityStrategy to make things clearer. We should also try to reduce the number of times we log the cache topology during conflict resolution (even if it's only at trace level).

      Attachments

        Activity

          People

            remerson@redhat.com Ryan Emerson
            dberinde@redhat.com Dan Berindei (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: