Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2612

Problem broadcasting CH_UPDATE command

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Do
    • Blocker
    • 5.2.0.CR2
    • 5.2.0.Beta5
    • Core
    • None

    Description

      Infinispan 5.2.0.Beta5
      JGroups 3.2.4.Final

      Steps to reproduce (I'm using two virtual interfaces test1, test2)

      1. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test1 -Djava.net.preferIPv4Stack=true
      2. wait 10 sec
      3. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test2 -Djava.net.preferIPv4Stack=true

      After 5 seconds there should be this timeout exception:

      19:42:14,146 WARN  [org.infinispan.topology.CacheTopologyControlCommand] (OOB-2,mlinhard-work-37329) ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=___defaultcache, type=REBALANCE_CONFIRM, sender=mlinhard-work-47337, joinInfo=null, topologyId=1, currentCH=null, pendingCH=null, throwable=null, viewId=1}
      java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException
      	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:91)
      	at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:563)
      	at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastConsistentHashUpdate(ClusterTopologyManagerImpl.java:349)
      	at org.infinispan.topology.ClusterTopologyManagerImpl.handleRebalanceCompleted(ClusterTopologyManagerImpl.java:213)
      	at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:160)
      	at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:137)
      	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:252)
      	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:219)
      	at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:483)
      	at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390)
      	at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248)
      	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:598)
      	at org.jgroups.JChannel.up(JChannel.java:703)
      	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
      	at org.jgroups.protocols.RSVP.up(RSVP.java:172)
      	at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
      	at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
      	at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
      	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
      	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
      	at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:736)
      	at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:414)
      	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:606)
      	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)
      	at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:187)
      	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
      	at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)
      	at org.jgroups.protocols.Discovery.up(Discovery.java:359)
      	at org.jgroups.protocols.TP.passMessageUp(TP.java:1287)
      	at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1850)
      	at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1823)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:662)
      Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException
      	at org.infinispan.util.Util.rewrapAsCacheException(Util.java:532)
      	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:152)
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:518)
      	at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:545)
      	at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:542)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	... 3 more
      Caused by: org.jgroups.TimeoutException: TimeoutException
      	at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:145)
      	at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:40)
      	at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93)
      	at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:287)
      	at org.jgroups.protocols.RSVP.down(RSVP.java:118)
      	at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)
      	at org.jgroups.JChannel.down(JChannel.java:718)
      	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:616)
      	at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:173)
      	at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:360)
      	at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:103)
      	at org.jgroups.blocks.Request.execute(Request.java:83)
      	at org.jgroups.blocks.MessageDispatcher.cast(MessageDispatcher.java:335)
      	at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:249)
      	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processCalls(CommandAwareRpcDispatcher.java:330)
      	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:145)
      	... 8 more
      
      

      Analysis:

      These are the messages sent after view change:

      test1                     test2
       <--- JOIN ---- 
       ---- REBALANCE_START ---> 
       <--- StateRequestCommand ---- 
       ---- StateResponseCommand ---> 
       <--- REBALANCE_CONFIRM ---- 
       ---- CH_UPDATE --->
      

      The last CH_UPDATE message is broadcast, test2 successfully processes it, but test1 stays in waiting state, because it for some reason awaits response also from itself - local variable entry in the method RSVP.down
      (https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/RSVP.java#L121)
      contained local address.

      Attachments

        1. test.zip
          5 kB
        2. session-cluster.xml
          5 kB
        3. test_TRACE.zip
          57 kB

        Activity

          People

            dberinde@redhat.com Dan Berindei (Inactive)
            mlinhard Michal Linhard (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: