Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-8731

Write command times out waiting for wrong topology

    XMLWordPrintable

Details

    Description

      MultipleCacheManagersTest.waitForClusterToForm(new String[]{}) doesn't actually wait for the rebalance to finish everywhere: https://github.com/infinispan/infinispan/pull/5705/files#diff-f9f535214b949a7c75454dd3fde54d46R304

      Most tests should work regardless of whether we wait for the rebalance to finish or not, it's just a way to reduce the number of test failures when the command retry doesn't work. Waiting for the rebalance to finish is only required when the test needs to install new topologies in a specific order.

      However, some recent test failures in CI show that the effectiveness of MultipleCacheManagersTest.waitForClusterToForm(new String[]{}) does make a difference. Most likely there is a bug in the retry logic during the latter phases of a rebalance, and the retry-specific tests do not cover all the scenarios.

      00:23:20,668 DEBUG (remote-thread-ClusteredScriptingTest-NodeI-p218-t5:[]) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache ___defaultcache, topology = CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[ClusteredScriptingTest-NodeI-40041: 134, ClusteredScriptingTest-NodeJ-9982: 122]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredScriptingTest-NodeI-40041, ClusteredScriptingTest-NodeJ-9982], persistentUUIDs=[91a47fef-fad6-479c-9ec9-12e9350dc6d5, d2200930-5793-4b74-beea-e001a2b414c1]}, availability mode = AVAILABLE
      00:23:20,669 DEBUG (remote-thread-ClusteredScriptingTest-NodeI-p218-t5:[]) [ClusterTopologyManagerImpl] Updating cluster-wide stable topology for cache ___defaultcache, topology = CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[ClusteredScriptingTest-NodeI-40041: 134, ClusteredScriptingTest-NodeJ-9982: 122]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredScriptingTest-NodeI-40041, ClusteredScriptingTest-NodeJ-9982], persistentUUIDs=[91a47fef-fad6-479c-9ec9-12e9350dc6d5, d2200930-5793-4b74-beea-e001a2b414c1]}
      00:23:20,669 DEBUG (transport-thread-ClusteredScriptingTest-NodeI-p220-t6:[Topology-___defaultcache]) [LocalTopologyManagerImpl] Updating local topology for cache ___defaultcache: CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[ClusteredScriptingTest-NodeI-40041: 134, ClusteredScriptingTest-NodeJ-9982: 122]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredScriptingTest-NodeI-40041, ClusteredScriptingTest-NodeJ-9982], persistentUUIDs=[91a47fef-fad6-479c-9ec9-12e9350dc6d5, d2200930-5793-4b74-beea-e001a2b414c1]}
      00:23:20,669 DEBUG (transport-thread-ClusteredScriptingTest-NodeJ-p234-t6:[Topology-___defaultcache]) [LocalTopologyManagerImpl] Updating local topology for cache ___defaultcache: CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[ClusteredScriptingTest-NodeI-40041: 134, ClusteredScriptingTest-NodeJ-9982: 122]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredScriptingTest-NodeI-40041, ClusteredScriptingTest-NodeJ-9982], persistentUUIDs=[91a47fef-fad6-479c-9ec9-12e9350dc6d5, d2200930-5793-4b74-beea-e001a2b414c1]}
      00:23:35,673 ERROR (timeout-thread-ClusteredScriptingTest-NodeI-p219-t1:[]) [InvocationContextInterceptor] ISPN000136: Error executing command PutKeyValueCommand, writing keys [/macbeth.txt0]
      org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
      	at org.infinispan.interceptors.impl.BaseStateTransferInterceptor$CancellableRetry.run(BaseStateTransferInterceptor.java:333) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      00:23:35,673 DEBUG (testng-ClusteredScriptingTest:[]) [DefaultCacheManager] Stopping cache manager ISPN on ClusteredScriptingTest-NodeJ-9982
      00:23:35,682 DEBUG (remote-thread-ClusteredScriptingTest-NodeI-p218-t5:[]) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache ___defaultcache, topology = CacheTopology{id=6, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[ClusteredScriptingTest-NodeI-40041: 256]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredScriptingTest-NodeI-40041], persistentUUIDs=[91a47fef-fad6-479c-9ec9-12e9350dc6d5]}, availability mode = AVAILABLE
      00:23:35,699 ERROR (testng-ClusteredScriptingTest:[]) [TestSuiteProgress] Test failed: org.infinispan.scripting.ClusteredScriptingTest.testDistributedMapReduceStreamWithFlag([REPL_SYNC])
      org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
      	at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:259) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      	at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1636) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      	at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1284) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      	at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1750) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      	at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:217) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      	at org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:358) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      	at org.infinispan.cache.impl.EncoderCache.put(EncoderCache.java:671) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      	at org.infinispan.scripting.utils.ScriptingUtils.loadData(ScriptingUtils.java:29) ~[test-classes/:?]
      	at org.infinispan.scripting.ClusteredScriptingTest$6.call(ClusteredScriptingTest.java:144) ~[test-classes/:?]
      	at org.infinispan.test.TestingUtil.withCacheManagers(TestingUtil.java:1522) ~[infinispan-core-9.2.0-SNAPSHOT-tests.jar:9.2.0-SNAPSHOT]
      	at org.infinispan.scripting.ClusteredScriptingTest.testDistributedMapReduceStreamWithFlag(ClusteredScriptingTest.java:137) ~[test-classes/:?]
      
      18:04:17,097 DEBUG (transport-thread-ClusteredCacheTest-NodeB-p3689-t1:[Topology-___defaultcache]) [LocalTopologyManagerImpl] Updating local topology for cache ___defaultcache: CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[ClusteredCacheTest-NodeA-53655: 134, ClusteredCacheTest-NodeB-27398: 122]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredCacheTest-NodeA-53655, ClusteredCacheTest-NodeB-27398], persistentUUIDs=[5f5b2dd6-c570-4c6c-9c2d-92aa5fcd2362, b623ac7a-3c77-471d-88de-e151da823c0c]}
      18:04:17,097 INFO  (testng-ClusteredCacheTest:[]) [TestSuiteProgress] Test starting: org.infinispan.query.blackbox.ClusteredCacheTest.testConditionalRemoveFromNonOwner
      18:04:17,097 DEBUG (transport-thread-ClusteredCacheTest-NodeA-p3679-t5:[Topology-___defaultcache]) [LocalTopologyManagerImpl] Updating local topology for cache ___defaultcache: CacheTopology{id=5, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[ClusteredCacheTest-NodeA-53655: 134, ClusteredCacheTest-NodeB-27398: 122]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredCacheTest-NodeA-53655, ClusteredCacheTest-NodeB-27398], persistentUUIDs=[5f5b2dd6-c570-4c6c-9c2d-92aa5fcd2362, b623ac7a-3c77-471d-88de-e151da823c0c]}
      18:04:32,100 ERROR (timeout-thread-ClusteredCacheTest-NodeA-p3678-t1:[]) [InvocationContextInterceptor] ISPN000136: Error executing command PutKeyValueCommand, writing keys [WrappedByteArray{bytes=[B0x010201054E617669..[9], hashCode=-1707624030}]
      org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
      	at org.infinispan.interceptors.impl.BaseStateTransferInterceptor$CancellableRetry.run(BaseStateTransferInterceptor.java:333) [infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      18:04:32,101 ERROR (testng-ClusteredCacheTest:[]) [TestSuiteProgress] Test failed: org.infinispan.query.blackbox.ClusteredCacheTest.testConditionalRemoveFromNonOwner
      org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 6
      	at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:259) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      	at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1636) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      	at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1284) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      	at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1750) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      	at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:217) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      	at org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:358) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      	at org.infinispan.cache.impl.EncoderCache.put(EncoderCache.java:671) ~[infinispan-core-9.2.0-SNAPSHOT.jar:9.2.0-SNAPSHOT]
      	at org.infinispan.query.blackbox.ClusteredCacheTest.prepareTestData(ClusteredCacheTest.java:140) ~[test-classes/:?]
      	at org.infinispan.query.blackbox.ClusteredCacheTest.testConditionalRemoveFrom(ClusteredCacheTest.java:334) ~[test-classes/:?]
      	at org.infinispan.query.blackbox.ClusteredCacheTest.testConditionalRemoveFromNonOwner(ClusteredCacheTest.java:298) ~[test-classes/:?]
      18:04:32,214 DEBUG (testng-ClusteredCacheTest:[]) [DefaultCacheManager] Stopping cache manager ISPN on ClusteredCacheTest-NodeB-27398
      18:04:32,479 DEBUG (remote-thread-ClusteredCacheTest-NodeA-p3677-t6:[]) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache ___defaultcache, topology = CacheTopology{id=6, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[ClusteredCacheTest-NodeA-53655: 256]}, pendingCH=null, unionCH=null, actualMembers=[ClusteredCacheTest-NodeA-53655], persistentUUIDs=[5f5b2dd6-c570-4c6c-9c2d-92aa5fcd2362]}, availability mode = AVAILABLE
      

      Attachments

        Issue Links

          Activity

            People

              dberinde@redhat.com Dan Berindei (Inactive)
              dberinde@redhat.com Dan Berindei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: