Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-6721

SuspectException when stopping and starting nodes in an embedded cluster using invalidation

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 9.0.0.Final
    • 8.2.2.Final
    • Core
    • None

    Description

      We run a cluster of four app servers on Tomcat with embedded Infinispan for Hibernate L2. When we bring down one of the app servers it shuts down that cache and then exits. On other nodes we frequently, although not consistently, receive SuspectExceptions from other nodes stating that the cache isn't running on the shutting down node. Similarly when starting a new app server we get the same.

      We are using synchronous invalidation.

      This feels like a race condition during startup and shutdown of the caches. Does that sound likely?

      I note in this wiki article https://github.com/infinispan/infinispan/wiki/Consistency-guarantees-in-Infinispan there is a note "TODO Create an issue in JIRA to ignore suspect exceptions.". I'm not sure if this is related!

      There are two stacktraces below. I note that the shutdown example fails in JGroupsTransport.java:798 which is when the response was suspected. While the startup example fails in JGroupsTransport.java:795, which is a CacheNotFoundResponse response.

      Here is a shutdown exception:

      May 27, 2016 3:05:01 PM org.apache.catalina.core.StandardWrapperValve invoke
      SEVERE: Servlet.service() for servlet [default] in context with path [] threw exception
      org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node app1-12786
              at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
              at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:798)
              at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
              at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
              at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
              at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
              at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
              at org.infinispan.remoting.transport.jgroups.RspListFuture.futureDone(RspListFuture.java:31)
              at org.jgroups.blocks.Request.checkCompletion(Request.java:162)
              at org.jgroups.blocks.GroupRequest.viewChange(GroupRequest.java:250)
              at org.jgroups.blocks.RequestCorrelator.receiveView(RequestCorrelator.java:316)
              at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:229)
              at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:695)
              at org.jgroups.JChannel.up(JChannel.java:738)
              at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
              at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
              at org.jgroups.protocols.FlowControl.up(FlowControl.java:392)
              at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:733)
              at org.jgroups.protocols.pbcast.ParticipantGmsImpl.handleViewChange(ParticipantGmsImpl.java:140)
              at org.jgroups.protocols.pbcast.GMS.up(GMS.java:923)
              at org.jgroups.stack.Protocol.up(Protocol.java:417)
              at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294)
              at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:474)
              at org.jgroups.protocols.pbcast.NAKACK2.deliverBatch(NAKACK2.java:982)
              at org.jgroups.protocols.pbcast.NAKACK2.removeAndPassUp(NAKACK2.java:912)
              at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:846)
              at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:618)
              at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
              at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
              at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
              at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
              at org.jgroups.protocols.Discovery.up(Discovery.java:296)
              at org.jgroups.protocols.TP.passMessageUp(TP.java:1590)
              at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1802)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

      And a startup exception:

      org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node app1-46933
              at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
              at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:795)
              at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
              at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
              at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
              at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
              at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
              at org.infinispan.remoting.transport.jgroups.RspListFuture.futureDone(RspListFuture.java:31)
              at org.jgroups.blocks.Request.checkCompletion(Request.java:162)
              at org.jgroups.blocks.GroupRequest.receiveResponse(GroupRequest.java:136)
              at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:373)
              at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:237)
              at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:695)
              at org.jgroups.JChannel.up(JChannel.java:738)
              at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
              at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
              at org.jgroups.protocols.FlowControl.up(FlowControl.java:392)
              at org.jgroups.protocols.pbcast.GMS.up(GMS.java:1043)
              at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
              at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1064)
              at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:779)
              at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:426)
              at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:652)
              at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
              at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
              at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
              at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
              at org.jgroups.protocols.Discovery.up(Discovery.java:296)
              at org.jgroups.protocols.TP.passMessageUp(TP.java:1590)
              at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1802)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              karlvr Karl von Randow (Inactive)
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: