Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-9536

Distributed stream iteration timeout during join

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 9.4.0.Final
    • 9.4.0.CR3
    • Core
    • None

      LocalStreamManagerImpl checks in several places that the cache status is RUNNING, and if not it suspects all local segments with this log message:

      [LocalStreamManagerImpl] Cache status is no longer running, all segments are now suspect for Test-NodeD-257370
      

      This check is incorrect, because a node may receive distributed streaming requests before it is running: the initial state transfer could finish before all the components in the cache finish starting.

      If the check fails because the cache is still starting, the originator of the distributed stream iteration will assume that the remote cache has a newer topology. Because there is no newer topology, the originator blocks for DistributedCacheTimeout.timeout() and then fails.

      10:48:06,137 DEBUG (transport-thread-Test-NodeE-p660-t3:[Topology-org.infinispan.CONFIG]) [LocalTopologyManagerImpl] Updating local topology for cache org.infinispan.CONFIG: CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (5)[Test-NodeC-45735: 55, Test-NodeA-61957: 55, Test-NodeE-27687: 52, Test-NodeB-13641: 45, Test-NodeD-25737: 49]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeC-45735, Test-NodeA-61957, Test-NodeE-27687, Test-NodeB-13641, Test-NodeD-25737], persistentUUIDs=[5fccba8a-802a-423a-8e3b-ec9c1892a5ae, 5d04de93-79ce-4eff-a827-2a4bbdf54080, 70b79733-dbbe-4622-a5f6-38d3ec7771ff, 87e8c232-21a0-4416-8c81-4829f4619414, 78146ad7-7594-4161-9a7e-c4da0cc79f42]}
      10:48:06,156 TRACE (remote-thread-Test-NodeE-p651-t2:[]) [LocalStreamManagerImpl] Request Test-NodeD-257370 completed for segments {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255} with {} suspected segments
      10:48:06,157 TRACE (remote-thread-Test-NodeE-p651-t2:[]) [LocalStreamManagerImpl] Cache status is no longer running, all segments are now suspect for Test-NodeD-257370
      10:48:06,158 TRACE (ForkThread-5,CounterConcurrentStartTest:[org.infinispan.CONFIG]) [DefaultCacheManager] Cache org.infinispan.CONFIG started
      
      10:48:06,165 TRACE (remote-thread-Test-NodeD-p649-t2:[]) [ClusterStreamManagerImpl] Received response from Test-NodeE-27687 with a completed response true for id Test-NodeD-257370 with {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255} suspected segments.
      10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [DistributedCacheStream] Found {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255} lost segments for identifier Test-NodeD-257370
      10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [DistributedCacheStream] Waiting for topology 10 to continue stream operation with segments {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255}
      10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [StateTransferLockImpl] Waiting for topology 10 to be installed, current topology is 9
      10:48:36,168 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [BasicComponentRegistryImpl] Changed status of org.infinispan.counter.api.CounterManager to FAILED
      org.infinispan.manager.EmbeddedCacheManagerStartupException: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.counter.impl.manager.EmbeddedCounterManager.start() on object of type EmbeddedCounterManager
      ...
      Caused by: java.util.concurrent.TimeoutException
      	at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) ~[?:1.8.0_171]
      	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) ~[?:1.8.0_171]
      	at org.infinispan.stream.impl.AbstractCacheStream.performOperationRehashAware(AbstractCacheStream.java:330) ~[classes/:?]
      	at org.infinispan.stream.impl.AbstractCacheStream.performOperation(AbstractCacheStream.java:229) ~[classes/:?]
      	at org.infinispan.stream.impl.DistributedCacheStream.anyMatch(DistributedCacheStream.java:406) ~[classes/:?]
      	at org.infinispan.util.AbstractDelegatingCacheStream.anyMatch(AbstractDelegatingCacheStream.java:300) ~[classes/:?]
      	at org.infinispan.CacheStream.anyMatch(CacheStream.java:462) ~[classes/:?]
      	at org.infinispan.counter.impl.manager.CounterConfigurationManager.start(CounterConfigurationManager.java:90) ~[classes/:?]
      	at org.infinispan.counter.impl.manager.EmbeddedCounterManager.start(EmbeddedCounterManager.java:89) ~[classes/:?]
      

            dberinde@redhat.com Dan Berindei (Inactive)
            dberinde@redhat.com Dan Berindei (Inactive)
            Diego Lovison Diego Lovison
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: