Uploaded image for project: 'Red Hat Fuse'
  1. Red Hat Fuse
  2. ENTESB-2932

Brokers are stopped after zk connection timout and reconnect

    XMLWordPrintable

Details

    • % %
    • Hide

      1 create fabric environment with broker(s)

      container-create-ssh --jvm-opts "-Xms1024M -Xmx2048M -XX:PermSize=128M -XX:MaxPermSize=512M " --host host --user fuse --password fuse  mesh-broker-01
      mq-create --no-ssl --assign-container mesh-broker-01   --group broker-a --network broker-a --networks-username admin --networks-password admin broker-a01
      
      container-create-ssh --jvm-opts "-Xms1024M -Xmx2048M -XX:PermSize=128M -XX:MaxPermSize=512M " --host host --user fuse --password fuse  mesh-broker-02
      mq-create --no-ssl --assign-container mesh-broker-02   --group broker-a --network broker-a --networks-username admin --networks-password admin broker-a02
      

      2.

      pkill -19 -f mesh-broker-01

      3. wait for 60s
      4.

      Command: pkill -18 -f mesh-broker-01

      5. repeat step 2-4 untill broker stop or container stays disconnected

      container-list 
      [id]            [version]  [type]  [connected]  [profiles]                     [provision status]
      mesh-broker-01  1.0        karaf   yes          mq-broker-broker-a.broker-a01  success           
                                                      fabric-client                                    
      mesh-broker-02  1.0        karaf   no           mq-broker-broker-a.broker-a02  success           
                                                      fabric-client                                    
      root*           1.0        karaf   yes          fabric                         success           
                                                      jboss-fuse-full                                  
                                                      fabric-ensemble-0001-1                           
      server1         1.0        karaf   yes          default                        success           
                                                      fabric-ensemble-0001-2                           
      server2         1.0        karaf   yes          default                        success           
                                                      fabric-ensemble-0001-3                        
      
      Show
      1 create fabric environment with broker(s) container-create-ssh --jvm-opts "-Xms1024M -Xmx2048M -XX:PermSize=128M -XX:MaxPermSize=512M " --host host --user fuse --password fuse mesh-broker-01 mq-create --no-ssl --assign-container mesh-broker-01 --group broker-a --network broker-a --networks-username admin --networks-password admin broker-a01 container-create-ssh --jvm-opts "-Xms1024M -Xmx2048M -XX:PermSize=128M -XX:MaxPermSize=512M " --host host --user fuse --password fuse mesh-broker-02 mq-create --no-ssl --assign-container mesh-broker-02 --group broker-a --network broker-a --networks-username admin --networks-password admin broker-a02 2. pkill -19 -f mesh-broker-01 3. wait for 60s 4. Command: pkill -18 -f mesh-broker-01 5. repeat step 2-4 untill broker stop or container stays disconnected container-list [id] [version] [type] [connected] [profiles] [provision status] mesh-broker-01 1.0 karaf yes mq-broker-broker-a.broker-a01 success fabric-client mesh-broker-02 1.0 karaf no mq-broker-broker-a.broker-a02 success fabric-client root* 1.0 karaf yes fabric success jboss-fuse-full fabric-ensemble-0001-1 server1 1.0 karaf yes default success fabric-ensemble-0001-2 server2 1.0 karaf yes default success fabric-ensemble-0001-3

    Description

      Container with broker deployed is stoped after zk connection timeout. It may cause that hawtio and CLI reports container as disconnected, but it's still possible to connect to container using container-connect command.

      org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
      	at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198)[71:io.fabric8.fabric-zookeeper:1.2.0.redhat-100]
      	at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)[71:io.fabric8.fabric-zookeeper:1.2.0.redhat-100]
      	at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)[71:io.fabric8.fabric-zookeeper:1.2.0.redhat-100]
      	at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:474)[71:io.fabric8.fabric-zookeeper:1.2.0.redhat-100]
      	at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:214)[71:io.fabric8.fabric-zookeeper:1.2.0.redhat-100]
      	at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:203)[71:io.fabric8.fabric-zookeeper:1.2.0.redhat-100]
      	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)[71:io.fabric8.fabric-zookeeper:1.2.0.redhat-100]
      	at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:199)[71:io.fabric8.fabric-zookeeper:1.2.0.redhat-100]
      	at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:191)[71:io.fabric8.fabric-zookeeper:1.2.0.redhat-100]
      	at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:38)[71:io.fabric8.fabric-zookeeper:1.2.0.redhat-100]
      	at io.fabric8.groups.internal.ZooKeeperGroup.refresh(ZooKeeperGroup.java:385)[65:io.fabric8.fabric-groups:1.2.0.redhat-100]
      	at io.fabric8.groups.internal.RefreshOperation.invoke(RefreshOperation.java:32)[65:io.fabric8.fabric-groups:1.2.0.redhat-100]
      	at io.fabric8.groups.internal.ZooKeeperGroup.mainLoop(ZooKeeperGroup.java:510)[65:io.fabric8.fabric-groups:1.2.0.redhat-100]
      	at io.fabric8.groups.internal.ZooKeeperGroup.access$200(ZooKeeperGroup.java:65)[65:io.fabric8.fabric-groups:1.2.0.redhat-100]
      	at io.fabric8.groups.internal.ZooKeeperGroup$4.run(ZooKeeperGroup.java:156)[65:io.fabric8.fabric-groups:1.2.0.redhat-100]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_75]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_75]
      	at java.lang.Thread.run(Thread.java:745)[:1.7.0_75]
      2015-03-26 05:08:01,188 | WARN  | ZooKeeperGroup-0 | ConnectionState                  | ?                                   ? | 71 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-100 | Connection attempt unsuccessful after 61585 (greater than max timeout of 60000). Resetting connection and trying again with a new connection.
      2015-03-26 05:08:01,189 | INFO  | ZooKeeperGroup-0 | ZooKeeper                        | ?                                   ? | 71 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-100 | Initiating client connection, connectString=172.16.72.12:2182,172.16.72.42:2181,172.16.72.26:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@76554e1e
      2015-03-26 05:08:01,209 | INFO  | up-0-EventThread | ConnectionStateManager           | ?                                   ? | 71 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-100 | State change: RECONNECTED
      2015-03-26 05:08:01,210 | INFO  | onStateManager-0 | GitDataStoreImpl                 | ?                                   ? | 66 - io.fabric8.fabric-git - 1.2.0.redhat-100 | Shared Counter (Re)connected, doing a pull
      2015-03-26 05:08:01,216 | INFO  | ad-1-EventThread | ConnectionStateManager           | ?                                   ? | 71 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-100 | State change: LOST
      2015-03-26 05:08:01,216 | WARN  | ad-1-EventThread | ConnectionState                  | ?                                   ? | 71 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-100 | Session expired event received
      2015-03-26 05:08:01,216 | INFO  | ZooKeeperGroup-0 | ActiveMQServiceFactory           | ?                                   ? | 175 - io.fabric8.mq.mq-fabric - 1.2.0.redhat-100 | Disconnected from the group
      2015-03-26 05:08:01,219 | INFO  | ZooKeeperGroup-0 | ActiveMQServiceFactory           | ?                                   ? | 175 - io.fabric8.mq.mq-fabric - 1.2.0.redhat-100 | Broker broker-a01 is now a slave, stopping the broker.
      2015-03-26 05:08:01,219 | INFO  | ool-160-thread-1 | BrokerService                    | ?                                   ? | 178 - org.apache.activemq.activemq-osgi - 5.11.0.redhat-620100 | Apache ActiveMQ 5.11.0.redhat-620100 (broker-a01, ID:fuseqe5-46442-1427360345781-0:2) is shutting down
      2015-03-26 05:08:01,220 | INFO  | ool-160-thread-1 | OsgiFabricDiscoveryAgent         | ?                                   ? | 178 - org.apache.activemq.activemq-osgi - 5.11.0.redhat-620100 | closing tracker
      2015-03-26 05:08:01,221 | INFO  | ool-160-thread-1 | NetworkConnector                 | ?                                   ? | 178 - org.apache.activemq.activemq-osgi - 5.11.0.redhat-620100 | Network Connector DiscoveryNetworkConnector:fabric-broker-a:BrokerService[broker-a01] stopped
      2015-03-26 05:08:01,227 | INFO  | ad-1-EventThread | ZooKeeper                        | ?                                   ? | 71 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-100 | Session: 0x34c554bbc840001 closed
      2015-03-26 05:08:01,228 | INFO  | ool-160-thread-1 | TransportConnector               | ?                                   ? | 178 - org.apache.activemq.activemq-osgi - 5.11.0.redhat-620100 | Connector openwire stopped
      2015-03-26 05:08:01,229 | INFO  | ad-1-EventThread | ZooKeeper                        | ?                                   ? | 71 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-100 | Initiating client connection, connectString=172.16.72.12:2182,172.16.72.42:2181,172.16.72.26:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@76554e1e
      2015-03-26 05:08:01,229 | INFO  | ool-160-thread-1 | TransportConnector               | ?                                   ? | 178 - org.apache.activemq.activemq-osgi - 5.11.0.redhat-620100 | Connector mqtt stopped
      2015-03-26 05:08:01,230 | INFO  | ool-160-thread-1 | TransportConnector               | ?                                   ? | 178 - org.apache.activemq.activemq-osgi - 5.11.0.redhat-620100 | Connector amqp stopped
      2015-03-26 05:08:01,235 | INFO  | ool-160-thread-1 | TransportConnector               | ?                                   ? | 178 - org.apache.activemq.activemq-osgi - 5.11.0.redhat-620100 | Connector stomp stopped
      2015-03-26 05:08:01,240 | INFO  | ad-1-EventThread | ConnectionStateManager           | ?                                   ? | 71 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-100 | State change: RECONNECTED
      2015-03-26 05:08:01,247 | INFO  | ool-160-thread-1 | ContextHandler                   | ?                                   ? | 93 - org.eclipse.jetty.aggregate.jetty-all-server - 8.1.16.v20140903 | stopped o.e.j.s.ServletContextHandler{/,null}
      2015-03-26 05:08:01,251 | INFO  | onStateManager-0 | DefaultPullPushPolicy            | ?                                   ? | 66 - io.fabric8.fabric-git - 1.2.0.redhat-100 | Performing a pull on remote URL: http://172.16.72.12:8181/git/fabric/
      2015-03-26 05:08:01,268 | INFO  | ZooKeeperGroup-0 | ActiveMQServiceFactory           | ?                                   ? | 175 - io.fabric8.mq.mq-fabric - 1.2.0.redhat-100 | Broker broker-a01 is now the master, starting the broker.
      2015-03-26 05:08:01,270 | INFO  | ZooKeeperGroup-0 | ActiveMQServiceFactory           | ?                                   ? | 175 - io.fabric8.mq.mq-fabric - 1.2.0.redhat-100 | Broker broker-a01 is being started.
      2015-03-26 05:08:01,328 | INFO  | ool-160-thread-1 | TransportConnector               | ?                                   ? | 178 - org.apache.activemq.activemq-osgi - 5.11.0.redhat-620100 | Connector ws stopped
      2015-03-26 05:08:01,345 | INFO  | ool-160-thread-1 | PListStoreImpl                   | ?                                   ? | 178 - org.apache.activemq.activemq-osgi - 5.11.0.redhat-620100 | PListStore:[/home/fuse/containers/mesh-broker-01/fabric8-karaf-1.2.0.redhat-100/databroker-a01/broker-a01/tmp_storage] stopped
      2015-03-26 05:08:01,345 | INFO  | ool-160-thread-1 | KahaDBStore                      | ?                                   ? | 178 - org.apache.activemq.activemq-osgi - 5.11.0.redhat-620100 | Stopping async queue tasks
      2015-03-26 05:08:01,346 | INFO  | ool-160-thread-1 | KahaDBStore                      | ?                                   ? | 178 - org.apache.activemq.activemq-osgi - 5.11.0.redhat-620100 | Stopping async topic tasks
      2015-03-26 05:08:01,346 | INFO  | ool-160-thread-1 | KahaDBStore                      | ?                                   ? | 178 - org.apache.activemq.activemq-osgi - 5.11.0.redhat-620100 | Stopped KahaDB
      2015-03-26 05:08:01,543 | INFO  | onStateManager-0 | DefaultPullPushPolicy            | ?                                   ? | 66 - io.fabric8.fabric-git - 1.2.0.redhat-100 | Pull result: [localUpdate=false,remoteUpdate=false,versions=[1.0, master],error=null]
      2015-03-26 05:08:01,545 | INFO  | ZooKeeperGroup-0 | ActiveMQServiceFactory           | ?                                   ? | 175 - io.fabric8.mq.mq-fabric - 1.2.0.redhat-100 | Disconnected from the group
      2015-03-26 05:08:01,546 | INFO  | onStateManager-0 | GitDataStoreImpl                 | ?                                   ? | 66 - io.fabric8.fabric-git - 1.2.0.redhat-100 | Shared Counter (Re)connected, doing a pull
      2015-03-26 05:08:01,548 | INFO  | ool-195-thread-1 | DefaultPullPushPolicy            | ?                                   ? | 66 - io.fabric8.fabric-git - 1.2.0.redhat-100 | Performing a pull on remote URL: http://172.16.72.12:8181/git/fabric/
      2015-03-26 05:08:01,560 | INFO  | ZooKeeperGroup-0 | ActiveMQServiceFactory           | ?                                   ? | 175 - io.fabric8.mq.mq-fabric - 1.2.0.redhat-100 | Reconnected to the group
      2015-03-26 05:08:01,585 | INFO  | ool-195-thread-1 | DefaultPullPushPolicy            | ?                                   ? | 66 - io.fabric8.fabric-git - 1.2.0.redhat-100 | Pull result: [localUpdate=false,remoteUpdate=false,versions=[1.0, master],error=null]
      2015-03-26 05:08:01,589 | INFO  | ool-195-thread-1 | DefaultPullPushPolicy            | ?                                   ? | 66 - io.fabric8.fabric-git - 1.2.0.redhat-100 | Performing a pull on remote URL: http://172.16.72.12:8181/git/fabric/
      2015-03-26 05:08:01,621 | INFO  | ool-195-thread-1 | DefaultPullPushPolicy            | ?                                   ? | 66 - io.fabric8.fabric-git - 1.2.0.redhat-100 | Pull result: [localUpdate=false,remoteUpdate=false,versions=[1.0, master],error=null]
      2015-03-26 05:08:01,631 | INFO  | onStateManager-0 | DefaultPullPushPolicy            | ?                                   ? | 66 - io.fabric8.fabric-git - 1.2.0.redhat-100 | Performing a pull on remote URL: http://172.16.72.12:8181/git/fabric/
      2015-03-26 05:08:01,669 | INFO  | onStateManager-0 | DefaultPullPushPolicy            | ?                                   ? | 66 - io.fabric8.fabric-git - 1.2.0.redhat-100 | Pull result: [localUpdate=false,remoteUpdate=false,versions=[1.0, master],error=null]
      2015-03-26 05:08:01,778 | INFO  | ool-160-thread-1 | BrokerService                    | ?                                   ? | 178 - org.apache.activemq.activemq-osgi - 5.11.0.redhat-620100 | Apache ActiveMQ 5.11.0.redhat-620100 (broker-a01, ID:fuseqe5-46442-1427360345781-0:2) uptime 5 minutes
      2015-03-26 05:08:01,779 | INFO  | ool-160-thread-1 | BrokerService                    | ?                                   ? | 178 - org.apache.activemq.activemq-osgi - 5.11.0.redhat-620100 | Apache ActiveMQ 5.11.0.redhat-620100 (broker-a01, ID:fuseqe5-46442-1427360345781-0:2) is shutdown
      2015-03-26 05:08:01,779 | INFO  | ool-160-thread-1 | ActiveMQServiceFactory           | ?                                   ? | 175 - io.fabric8.mq.mq-fabric - 1.2.0.redhat-100 | Broker 'broker-a01' shut down, giving up being master
      2015-03-26 05:08:01,801 | INFO  | ool-160-thread-1 | ActiveMQServiceFactory           | ?                                   ? | 175 - io.fabric8.mq.mq-fabric - 1.2.0.redhat-100 | Disconnected from the group
      2015-03-26 05:08:01,802 | INFO  | ool-160-thread-1 | ActiveMQServiceFactory           | ?                                   ? | 175 - io.fabric8.mq.mq-fabric - 1.2.0.redhat-100 | Lost zookeeper service for broker broker-a01, stopping the broker.
      2015-03-26 05:08:01,811 | INFO  | pool-12-thread-1 | HttpServiceFactoryImpl           | ?                                   ? | 100 - org.ops4j.pax.web.pax-web-runtime - 3.2.0 | Unbinding bundle: [io.fabric8.fabric-git-server [102]]
      2015-03-26 05:24:04,792 | INFO  | 9]-nio2-thread-1 | ServerSession                    | ?                                   ? | 35 - org.apache.sshd.core - 0.12.0.redhat-002 | Server session created from /172.16.72.12:58984
      2015-03-26 05:24:04,801 | INFO  | 9]-nio2-thread-1 | SimpleGeneratorHostKeyProvider   | ?                                   ? | 35 - org.apache.sshd.core - 0.12.0.redhat-002 | Generating host key...
      2015-03-26 05:24:09,993 | INFO  | 9]-nio2-thread-1 | ServerSession                    | ?                                   ? | 35 - org.apache.sshd.core - 0.12.0.redhat-002 | Kex: server->client aes128-ctr hmac-sha1 none
      2015-03-26 05:24:09,993 | INFO  | 9]-nio2-thread-1 | ServerSession                    | ?                                   ? | 35 - org.apache.sshd.core - 0.12.0.redhat-002 | Kex: client->server aes128-ctr hmac-sha1 none
      2015-03-26 05:24:11,611 | INFO  | 9]-nio2-thread-3 | ServerUserAuthService            | ?                                   ? | 35 - org.apache.sshd.core - 0.12.0.redhat-002 | Session admin@/172.16.72.12:58984 authenticated
      
      

      Attachments

        Issue Links

          Activity

            People

              hchirino Hiram Chirino
              mmelko@redhat.com Matej Melko
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: