Uploaded image for project: 'Red Hat Fuse'
  1. Red Hat Fuse
  2. ENTESB-3231

[ER5] Odd behavior with brokers (and group membership) and container restarts

    XMLWordPrintable

Details

    • % %
    • Hide

      1. fabric-create
      2. container-create-ssh

      {broker1, broker2} - each on own openstack node
      3. mq-create --assign-container {broker1, broker2}

      --group masterslave brokers
      4. container-stop <current group master broker>

      Show
      1. fabric-create 2. container-create-ssh {broker1, broker2} - each on own openstack node 3. mq-create --assign-container {broker1, broker2} --group masterslave brokers 4. container-stop <current group master broker>

    Description

      I'm using masterslave configuration with fabric, 2 ssh containers (1 master broker 1 slave broker). After random number of restarts the brokers will start to lose group membership - "Disconnected from group" logs.

      Meanwhile, you can see following in the cluster-list output (after ~20 restarts of broker1 and broker2:

      JBossFuse:admin@root> cluster-list | grep -A 1 amq/masterslave
       [cluster]                                                               [masters]  [slaves]
       amq/masterslave                                                                                                                                                                                                                                                                                                                                                                               
         brokers                                                              broker1    broker2, broker2, broker2, broker1, broker2, broker1, broker2, broker1, broker2, broker1, broker1, broker2, broker1, broker1, broker2, broker2, broker2, broker2, broker2, broker1, broker1  tcp://172.16.72.94:58099, mqtt://172.16.72.94:35563, amqp://172.16.72.94:33087, stomp://172.16.72.94:60064
      

      there are more weird outputs of this (snap after a next few restarts - no master):

      JBossFuse:admin@root> cluster-list | grep -A 1 amq/masterslave
       [cluster]                                                               [masters]  [slaves]
       amq/masterslave                                                                            
       brokers                                                              -          broker2, broker2, broker2, broker2, broker2, broker2, broker2, broker2, broker2, broker1, broker2, broker2, broker1, broker2, broker2, broker2, broker2  -  
      

      in the broker logs you can see very often:

      java.lang.RuntimeException
               at io.fabric8.mq.fabric.ActiveMQServiceFactory$ClusteredConfiguration$4.groupEvent(ActiveMQServiceFactory.java:699)[174:io.fabric8.mq.mq-fabric:1.2.0.redhat-117]
               at io.fabric8.groups.internal.ZooKeeperGroup$6.apply(ZooKeeperGroup.java:402)[65:io.fabric8.fabric-groups:1.2.0.redhat-117]
               at io.fabric8.groups.internal.ZooKeeperGroup$6.apply(ZooKeeperGroup.java:398)[65:io.fabric8.fabric-groups:1.2.0.redhat-117]
               at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92)[71:io.fabric8.fabric-zookeeper:1.2.0.redhat-117]
               at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)[56:com.google.guava:17.0.0]
               at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83)[71:io.fabric8.fabric-zookeeper:1.2.0.redhat-117]
               at io.fabric8.groups.internal.ZooKeeperGroup.callListeners(ZooKeeperGroup.java:396)[65:io.fabric8.fabric-groups:1.2.0.redhat-117]
               at io.fabric8.groups.internal.EventOperation.invoke(EventOperation.java:34)[65:io.fabric8.fabric-groups:1.2.0.redhat-117]
               at io.fabric8.groups.internal.ZooKeeperGroup.mainLoop(ZooKeeperGroup.java:510)[65:io.fabric8.fabric-groups:1.2.0.redhat-117]
               at io.fabric8.groups.internal.ZooKeeperGroup.access$200(ZooKeeperGroup.java:65)[65:io.fabric8.fabric-groups:1.2.0.redhat-117]
               at io.fabric8.groups.internal.ZooKeeperGroup$4.run(ZooKeeperGroup.java:156)[65:io.fabric8.fabric-groups:1.2.0.redhat-117]
               at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_75]
               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_75]
               at java.lang.Thread.run(Thread.java:745)[:1.7.0_75]
       Caused by: java.lang.NullPointerException
               at io.fabric8.mq.fabric.ActiveMQServiceFactory$ClusteredConfiguration.registerConnectors(ActiveMQServiceFactory.java:551)[174:io.fabric8.mq.mq-fabric:1.2.0.redhat-117]
               at io.fabric8.mq.fabric.ActiveMQServiceFactory$ClusteredConfiguration.access$1400(ActiveMQServiceFactory.java:317)[174:io.fabric8.mq.mq-fabric:1.2.0.redhat-117]
               at io.fabric8.mq.fabric.ActiveMQServiceFactory$ClusteredConfiguration$4.groupEvent(ActiveMQServiceFactory.java:683)[174:io.fabric8.mq.mq-fabric:1.2.0.redhat-117]
               ... 13 more
      

      Once the broker finally starts without problems, there are two options:
      1. everything works OK and the broker is a part of specified broker group
      2. broker works OK but is not part of the group

      Full logs are accessible here:

      broker1: http://file.brq.redhat.com/~avano/fuse.log
      broker2: http://file.brq.redhat.com/~avano/fuse2.log
      

      (check how broker tries to start, fail, try to start again etc and then how many consecutive "Broker is a slave" and "disconnected from the group" you can see at the end of the log of broker2)

      Attachments

        Issue Links

          Activity

            People

              hchirino Hiram Chirino
              avano@redhat.com Andrej Vano
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: