Loading...

Details

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: 10.1.0.CR1, 10.1.0.Final
Affects Version/s: 10.0.0.Final
Component/s: Clustering
Labels:
None

Git Pull Request:
https://github.com/wildfly/wildfly/pull/8699

Description

Seen in failover tests - HA Singleton deployment scenarios - jvmkill failover type, random election policy

Something weird is happening when a new election should take place after any of the nodes is killed (it doesn't have to be cluster coordinator nor the singleton provider):

Timeline:

perf18 was killed around 03:40:07
perf19 was elected, ~~JBEAP-2254~~ occured, perf19 left the cluster
right after that perf20 was elected, ~~JBEAP-2254~~ occured, perf20 left the cluster
right after that perrf21 was elected and logged these errors (perf21 was the only node in the cluster that time):

[JBossINF] [0m[0m03:40:07,816 INFO  [org.wildfly.clustering.server] (notification-thread--p2-t1) WFLYCLSV0001: This node will now operate as the singleton provider of the jboss.deployment.unit."clusterbench-ee7-singleton-jbossall.ear".FIRST_MODULE_USE service
[JBossINF] [0m[31m03:40:07,821 ERROR [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p15-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator: org.infinispan.commons.CacheException: Unsuccessful response received from node perf20: CacheNotFoundResponse
[JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:480)
[JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:348)
[JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:286)
[JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$1.run(ClusterTopologyManagerImpl.java:617)
[JBossINF] 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[JBossINF] 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[JBossINF] 	at java.lang.Thread.run(Thread.java:745)
...
[JBossINF] [0m[0m03:40:07,905 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-8,ee,perf21) ISPN000094: Received new cluster view for channel server: [perf21|6] (1) [perf21]
[JBossINF] [0m[31m03:40:07,904 ERROR [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p14-t15) ISPN000196: Failed to recover cluster state after the current node became the coordinator: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node perf20
[JBossINF] 	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
[JBossINF] 	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
[JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:471)
[JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:348)
[JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:286)
[JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$1.run(ClusterTopologyManagerImpl.java:617)
[JBossINF] 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[JBossINF] 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[JBossINF] 	at java.lang.Thread.run(Thread.java:745)
[JBossINF] Caused by: org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node perf20
[JBossINF] 	at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
[JBossINF] 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:763)
[JBossINF] 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$174(JGroupsTransport.java:599)
[JBossINF] 	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
[JBossINF] 	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
[JBossINF] 	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
[JBossINF] 	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
[JBossINF] 	at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.futureDone(SingleResponseFuture.java:30)
[JBossINF] 	at org.jgroups.blocks.Request.checkCompletion(Request.java:169)
[JBossINF] 	at org.jgroups.blocks.UnicastRequest.viewChange(UnicastRequest.java:164)
[JBossINF] 	at org.jgroups.blocks.RequestCorrelator.receiveView(RequestCorrelator.java:331)
[JBossINF] 	at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:242)
[JBossINF] 	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:684)
[JBossINF] 	at org.jgroups.JChannel.up(JChannel.java:738)
[JBossINF] 	at org.jgroups.fork.ForkProtocolStack.up(ForkProtocolStack.java:123)
[JBossINF] 	at org.jgroups.stack.Protocol.up(Protocol.java:374)
[JBossINF] 	at org.jgroups.protocols.FORK.up(FORK.java:118)
[JBossINF] 	at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
[JBossINF] 	at org.jgroups.protocols.FlowControl.up(FlowControl.java:394)
[JBossINF] 	at org.jgroups.protocols.FlowControl.up(FlowControl.java:394)
[JBossINF] 	at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:735)
[JBossINF] 	at org.jgroups.protocols.pbcast.CoordGmsImpl.handleViewChange(CoordGmsImpl.java:244)
[JBossINF] 	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:925)
[JBossINF] 	at org.jgroups.stack.Protocol.up(Protocol.java:412)
[JBossINF] 	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294)
[JBossINF] 	at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:474)
[JBossINF] 	at org.jgroups.protocols.pbcast.NAKACK2.deliverBatch(NAKACK2.java:982)
[JBossINF] 	at org.jgroups.protocols.pbcast.NAKACK2.removeAndPassUp(NAKACK2.java:912)
[JBossINF] 	at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:846)
[JBossINF] 	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:618)
[JBossINF] 	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
[JBossINF] 	at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
[JBossINF] 	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
[JBossINF] 	at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
[JBossINF] 	at org.jgroups.protocols.Discovery.up(Discovery.java:295)
[JBossINF] 	at org.jgroups.protocols.TP.passMessageUp(TP.java:1577)
[JBossINF] 	at org.jgroups.protocols.TP$3.run(TP.java:1511)
[JBossINF] 	... 3 more

Link:
http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-failover-singleton-deployment-jvmkill-random-election-policy/9/console-perf21/

Attachments

Issue Links

clones

JBEAP-3416 [7.1] Failed to recover cluster state after the current node became the coordinator

Verified

is blocked by

WFLY-6126 Upgrade Infinispan to 8.1.2.Final

Closed

Failed to recover cluster state after the current node became the coordinator

Details

Description

Attachments

Issue Links

Activity

People

Dates