Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-10365

PreferAvailabilityStrategy assertion failure

    XMLWordPrintable

Details

    Description

      This scenario happens unintentionally in RebalancePolicyJmxTest, because the test waits for the default cache to finish rebalancing before killing the coordinator but doesn't care about the CONFIG cache:

      • A and B are running, rebalancing is disabled, then C and D join
      • Re-enable rebalance, but stop B and A before the rebalance is done
      • C sees the finished rebalance, D sees the READ_OLD phase
      • C becomes coordinator and should recover with C's topology, but instead has an assertion failure and doesn't install a stable topology
      16:48:48,454 TRACE (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [PreferAvailabilityStrategy] Cache org.infinispan.CONFIG keeping partition from [Test-NodeC-27509(rack-id=r2)]: CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeC-27509(rack-id=r2): 125, Test-NodeD-62603(rack-id=r2): 131]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)], persistentUUIDs=[59d0898c-f166-4129-9165-a22aca475286, 05d7dd0b-7cd8-464d-8adb-41fac100e8bf]}
      16:48:48,454 TRACE (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [PreferAvailabilityStrategy] Cache org.infinispan.CONFIG keeping partition from [Test-NodeD-62603(rack-id=r2)]: CacheTopology{id=6, phase=READ_OLD_WRITE_ALL, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeA-4515(rack-id=r1): 127, Test-NodeB-42590(rack-id=r1): 129]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (4)[Test-NodeA-4515(rack-id=r1): 63, Test-NodeB-42590(rack-id=r1): 62, Test-NodeC-27509(rack-id=r2): 64, Test-NodeD-62603(rack-id=r2): 67]}, unionCH=null, actualMembers=[Test-NodeA-4515(rack-id=r1), Test-NodeB-42590(rack-id=r1), Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)], persistentUUIDs=[e9dcc3da-07a2-4159-a8b1-94e6428011c4, 3f27ddaa-1146-483e-8473-d79a5ba347f5, 59d0898c-f166-4129-9165-a22aca475286, 05d7dd0b-7cd8-464d-8adb-41fac100e8bf]}
      16:48:48,454 TRACE (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [PreferAvailabilityStrategy] Cache org.infinispan.CONFIG, resolveConflicts=false, newMembers=[Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)], possibleOwners=[Test-NodeD-62603(rack-id=r2), Test-NodeC-27509(rack-id=r2)], preferredTopology=CacheTopology{id=6, phase=READ_OLD_WRITE_ALL, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeA-4515(rack-id=r1): 127, Test-NodeB-42590(rack-id=r1): 129]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (4)[Test-NodeA-4515(rack-id=r1): 63, Test-NodeB-42590(rack-id=r1): 62, Test-NodeC-27509(rack-id=r2): 64, Test-NodeD-62603(rack-id=r2): 67]}, unionCH=null, actualMembers=[Test-NodeA-4515(rack-id=r1), Test-NodeB-42590(rack-id=r1), Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)], persistentUUIDs=[e9dcc3da-07a2-4159-a8b1-94e6428011c4, 3f27ddaa-1146-483e-8473-d79a5ba347f5, 59d0898c-f166-4129-9165-a22aca475286, 05d7dd0b-7cd8-464d-8adb-41fac100e8bf]}, mergeTopologyId=10
      16:48:48,454 WARN  (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [PreferAvailabilityStrategy] ISPN000517: Ignoring cache topology from [Test-NodeC-27509(rack-id=r2)] during merge: CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeC-27509(rack-id=r2): 125, Test-NodeD-62603(rack-id=r2): 131]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)], persistentUUIDs=[59d0898c-f166-4129-9165-a22aca475286, 05d7dd0b-7cd8-464d-8adb-41fac100e8bf]}
      16:48:48,454 DEBUG (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [CLUSTER] ISPN000521: Cache org.infinispan.CONFIG recovered after merge with topology = CacheTopology{id=10, phase=NO_REBALANCE, rebalanceId=4, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeA-4515(rack-id=r1): 127, Test-NodeB-42590(rack-id=r1): 129]}, pendingCH=null, unionCH=null, actualMembers=[], persistentUUIDs=[]}, availability mode null
      16:48:48,454 FATAL (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [CLUSTER] [Context=org.infinispan.CONFIG] ISPN000313: Lost data because of abrupt leavers [Test-NodeA-4515(rack-id=r1), Test-NodeB-42590(rack-id=r1), Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)]
      16:48:48,455 ERROR (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [LimitedExecutor] Exception in task
      java.lang.AssertionError: null
      	at org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.onPartitionMerge(PreferAvailabilityStrategy.java:217) ~[classes/:?]
      	at org.infinispan.topology.ClusterCacheStatus.doMergePartitions(ClusterCacheStatus.java:647) ~[classes/:?]
      	at org.infinispan.topology.ClusterTopologyManagerImpl.lambda$recoverClusterStatus$4(ClusterTopologyManagerImpl.java:500) ~[classes/:?]
      	at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) [classes/:?]
      

      Eventually the missing stable topology makes the test fail:

      16:48:49,349 DEBUG (testng-Test:[null]) [ClusterCacheStatus] ISPN000519: Updating stable topology for cache org.infinispan.CONFIG, topology null
      16:48:49,349 WARN  (testng-Test:[null]) [CacheTopologyControlCommand] ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=null, type=POLICY_ENABLE, sender=Test-NodeC-27509(rack-id=r2), joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=5}
      java.lang.NullPointerException: null
      	at org.infinispan.topology.CacheTopologyControlCommand.<init>(CacheTopologyControlCommand.java:147) ~[classes/:?]
      	at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastStableTopologyUpdate(ClusterTopologyManagerImpl.java:659) ~[classes/:?]
      	at org.infinispan.topology.ClusterCacheStatus.startQueuedRebalance(ClusterCacheStatus.java:806) ~[classes/:?]
      	at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4772) ~[?:?]
      	at org.infinispan.topology.ClusterTopologyManagerImpl.setRebalancingEnabled(ClusterTopologyManagerImpl.java:702) ~[classes/:?]
      	at org.infinispan.topology.ClusterTopologyManagerImpl.setRebalancingEnabled(ClusterTopologyManagerImpl.java:682) ~[classes/:?]
      	at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:215) ~[classes/:?]
      	at org.infinispan.topology.CacheTopologyControlCommand.invokeAsync(CacheTopologyControlCommand.java:163) [classes/:?]
      	at org.infinispan.commands.ReplicableCommand.invoke(ReplicableCommand.java:44) [classes/:?]
      	at org.infinispan.topology.LocalTopologyManagerImpl.executeOnClusterSync(LocalTopologyManagerImpl.java:752) [classes/:?]
      	at org.infinispan.topology.LocalTopologyManagerImpl.setCacheRebalancingEnabled(LocalTopologyManagerImpl.java:623) [classes/:?]
      	at org.infinispan.topology.LocalTopologyManagerImpl.setRebalancingEnabled(LocalTopologyManagerImpl.java:581) [classes/:?]
      16:48:49,355 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.statetransfer.RebalancePolicyJmxTest.testJoinAndLeaveWithRebalanceSuspendedAwaitingInitialTransfer[DIST_SYNC]
      javax.management.MBeanException: Error invoking setter for attribute rebalancingEnabled
      	at org.infinispan.jmx.ResourceDMBean.setNamedAttribute(ResourceDMBean.java:358) ~[classes/:?]
      	at org.infinispan.jmx.ResourceDMBean.setAttribute(ResourceDMBean.java:216) ~[classes/:?]
      	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.setAttribute(DefaultMBeanServerInterceptor.java:736) ~[?:?]
      	at com.sun.jmx.mbeanserver.JmxMBeanServer.setAttribute(JmxMBeanServer.java:739) ~[?:?]
      	at org.infinispan.statetransfer.RebalancePolicyJmxTest.doTest(RebalancePolicyJmxTest.java:163) ~[test-classes/:?]
      	at org.infinispan.statetransfer.RebalancePolicyJmxTest.testJoinAndLeaveWithRebalanceSuspendedAwaitingInitialTransfer(RebalancePolicyJmxTest.java:44) ~[test-classes/:?]
      Caused by: java.lang.reflect.InvocationTargetException
      	at jdk.internal.reflect.GeneratedMethodAccessor495.invoke(Unknown Source) ~[?:?]
      	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
      	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
      	at org.infinispan.jmx.ResourceDMBean$InvokableSetterBasedMBeanAttributeInfo.invoke(ResourceDMBean.java:422) ~[classes/:?]
      	at org.infinispan.jmx.ResourceDMBean.setNamedAttribute(ResourceDMBean.java:355) ~[classes/:?]
      	... 28 more
      Caused by: org.infinispan.commons.CacheException: Unsuccessful local response
      	at org.infinispan.topology.LocalTopologyManagerImpl.executeOnClusterSync(LocalTopologyManagerImpl.java:757) ~[classes/:?]
      	at org.infinispan.topology.LocalTopologyManagerImpl.setCacheRebalancingEnabled(LocalTopologyManagerImpl.java:623) ~[classes/:?]
      	at org.infinispan.topology.LocalTopologyManagerImpl.setRebalancingEnabled(LocalTopologyManagerImpl.java:581) ~[classes/:?]
      	at jdk.internal.reflect.GeneratedMethodAccessor495.invoke(Unknown Source) ~[?:?]
      	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
      	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
      	at org.infinispan.jmx.ResourceDMBean$InvokableSetterBasedMBeanAttributeInfo.invoke(ResourceDMBean.java:422) ~[classes/:?]
      	at org.infinispan.jmx.ResourceDMBean.setNamedAttribute(ResourceDMBean.java:355) ~[classes/:?]
      	... 28 more
      

      Attachments

        Issue Links

          Activity

            People

              dberinde@redhat.com Dan Berindei (Inactive)
              dberinde@redhat.com Dan Berindei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: