-
Type:
Bug
-
Status: Resolved
-
Priority:
Major
-
Resolution: Done
-
Affects Version/s: 7.3.0.redhat-61
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
-
Environment:
Red Hat JBoss Fuse 6.1.0 GA
-
Steps to Reproduce:
A large Fuse fabric installation runs on a collection of virtual machines. After an outage at the VM networking level, customer reports that the ensemble did not recover normal operation, and a complete restart of the installation was required.
While I can't reproduce the customer's exact problem, I can reproduce what I believe is a very similar one. All that is needed is to create a 3-node ensemble on virtual machines, and suspend one of the VMs for some time, then wake it up.
If I do container-list on the machine that was suspended, it fails completely – the command does not exist. This is the expected result for a container does not consider itself part of a fabric. However the VM and the container are live, and there is network connectivity between the VMs.
Looking at the logs for the container that gets resumed, I see a whole slew of zookeeper-related network exceptions. "java.lang.IllegalStateException: Client has been stopped" seems to be particularly relevant here. It does look as if there are some connection-related problems from which Zookeeper simply never recovers.
per.server.quorum.LearnerHandler 562 | 53 - io.fabric8.fabric-zookeeper - 1.0.0.redhat-379 | Unexpected exception causing shutdown while sock still open
|
java.net.SocketTimeoutException: Read timed out
|
at java.net.SocketInputStream.socketRead0(Native Method)[:1.7.0_55]
|
at java.net.SocketInputStream.read(SocketInputStream.java:152)[:1.7.0_55
|
|
|
orum.QuorumCnxManager$RecvWorker 762 | 53 - io.fabric8.fabric-zookeeper - 1.0.0.redhat-379 | Connection broken for id 1, my id = 2, error =
|
java.net.SocketException: Connection reset
|
at java.net.SocketInputStream.read(SocketInputStream.java:196)[:1.7.0_55]
|
at java.net.SocketInputStream.read(SocketInputStream.java:122)[:1.7.0_55]
|
|
|
2014-05-15 18:56:02,521 | ERROR | ZooKeeperGroup-0 | ConnectionState | g.apache.curator.ConnectionState 194 | 53 - io.fabric8.fabric-zookeeper - 1.0.0.redhat-379 | Connection timed out for connection string (lars:2182,toot:2181,zoot:2181) and timeout (15000) / elapsed (15004)
|
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
|
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:191)[53:io.fabric8.fabric-zookeeper:1.0.0.redhat-379]
|
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:86)[53:io.fabric8.fabric-zookeeper:1.0.0.redhat-379]
|
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116)[53:io.fabric8.fabric-zookeeper:1.0.0.redhat-379]
|
at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:456)[53:io.fabric8.fabric-zookeeper:1.0.0.redhat-379]
|
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:214)[53:io.fabric8.fabric-zookeeper:1.0.0.redhat-379]
|
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:203)[53:io.fabric8.fabric-zookeeper:1.0.0.redhat-379]
|
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)[53:io.fabric8.fabric-zookeeper:1.0.0.redhat-379]
|
at
|
...
|
|
|
2014-05-15 18:57:12,460 | WARN | 0:0:0:0:0:0:2181 | Learner | zookeeper.server.quorum.Follower 89 | 53 - io.fabric8.fabric-zookeeper - 1.0.0.redhat-379 | Exception when following the leader
|
java.net.SocketException: Connection reset
|
at java.net.SocketInputStream.read(SocketInputStream.java:196)[:1.7.0_55]
|
at java.net.SocketInputStream.read(SocketInputStream.java:122)[:1.7.0_55]
|
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)[:1.7.0_55]
|
|
|
|
2014-05-15 18:57:12,520 | INFO | 0:0:0:0:0:0:2181 | Learner | zookeeper.server.quorum.Follower 166 | 53 - io.fabric8.fabric-zookeeper - 1.0.0.redhat-379 | shutdown called
|
java.lang.Exception: shutdown Follower
|
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)[53:io.fabric8.fabric-zookeeper:1.0.0.redhat-379]
|
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)[53:io.fabric8.fabric-zookeeper:1.0.0.redhat-379]
|
|
|
|
|
2014-05-15 19:05:31,695 | WARN | pool-61-thread-1 | GitDataStore | abric8.git.internal.GitDataStore 1208 | 85 - io.fabric8.fabric-git - 1.0.0.redhat-379 | Failed to perform a pull java.lang.IllegalStateException: Client has been stopped
|
java.lang.IllegalStateException: Client has been stopped
|
at com.google.common.base.Preconditions.checkState(Preconditions.java:150)[83:com.google.guava:15.0.0]
|
at org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:320)[53:io.fabric8.fabric-zookeeper:1.0.0.redhat-379]
|
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:105)[53:io.fabric8.fabric-zookeeper:1.0.0.redhat-379]
|
at org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:252)[53:io.fabric8.fabric-zookeeper:1.0.0.redhat-379]
|
at org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:239)[53:io.fabric8.fabric-zookeeper:1.0.0.redhat-379]
|
at org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:39)[53:io.fabric8.fabric-zookeeper:1.0.0.redhat-379]
|
at io.fabric8.zookeeper.utils.ZooKeeperUtils.setData(ZooKeeperUtils.java:204)[53:io.fabric8.fabric-zookeeper:1.0.0.redhat-379]
|
- is related to
-
FABRIC-1227 Non-Ensemble Fabric Server IllegalArgumentException - A HostProvider may not be empty!
-
- Resolved
-