Uploaded image for project: 'Red Hat Fuse'
  1. Red Hat Fuse
  2. ENTESB-9944

Fabric: master component failover doesn't happen in case of memory/GC issues.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • jboss-fuse-6.3
    • Fabric8 v1
    • None
    • % %
    • Hide
      • Customer issue is described in description section.
      • Attached application MasterComponentTest.zip.
      • Perform following steps.
        container-create-child root abc
        container-create-child root pqr
        profile-create --parent feature-camel testmaster
        profile-edit --feature camel-jetty9 testmaster
        profile-edit --bundle mvn:com.mycompany/camel-master-test/1.0 testmaster
        container-add-profile abc testmaster
        container-add-profile pqr testmaster
        container-edit-jvm-options abc "-Xmx150m"
        container-edit-jvm-options pqr "-Xmx150m"
        container-stop abc pqr
        container-start abc pqr
        #check which container is master and slave for the application.
        cluster-list |grep -i failover                                          
        
      • Multiple GC can be invoked with command.
        for ((a=1; a <= 50; a++)); do jcmd <process_id> GC.run; sleep 2; echo $a; done
        
      • In code there is a class TestClass.java, that can be used to invoke OOM. Currently this class is not used in route.
      Show
      Customer issue is described in description section. Attached application MasterComponentTest.zip. Perform following steps. container-create-child root abc container-create-child root pqr profile-create --parent feature-camel testmaster profile-edit --feature camel-jetty9 testmaster profile-edit --bundle mvn:com.mycompany/camel-master-test/1.0 testmaster container-add-profile abc testmaster container-add-profile pqr testmaster container-edit-jvm-options abc "-Xmx150m" container-edit-jvm-options pqr "-Xmx150m" container-stop abc pqr container-start abc pqr #check which container is master and slave for the application. cluster-list |grep -i failover Multiple GC can be invoked with command. for ((a=1; a <= 50; a++)); do jcmd <process_id> GC.run; sleep 2; echo $a; done In code there is a class TestClass.java, that can be used to invoke OOM. Currently this class is not used in route.

    Description

      Customer has following query -

      What we experience is that when a runtime runs into memory issues, the GC tries to free up memory, and thus blocks all the other active threads and takes +- 100% CPU on the GC thread. As the runtime was trying to take more and more memory, the GC wasn't able to free up memory and blocked all the threads on the runtime virtual. I would have thought that zookeeper would have seen this, as he didn't get a signal anymore that the master: routes on that runtime were alive. I would have suspected that zookeeper would have signaled the other 2 runtimes to take over the master: component, so that processing could continue.

      What I observe is in case of memory issue master component doesn’t switch node, it remains on same master node. I tested with OOM and invoked multiple/parallel FULL GC.
      Do you have any suggestions here ? I told customer that we should find what is the root cause for memory issue that should resolve problem but can zookeeper read such events(memory related) too (might be if one thread watch memory statistics).

      JBossFuse:karaf@root> cluster-list |grep -i failover
      camel/master/FailoverDemo                                                                                                                      
         FailoverDemo                                                            abc        pqr       -                                              
      JBossFuse:karaf@root>
      
      JBossFuse:karaf@root> zk:get /fabric/registry/clusters/camel/master/FailoverDemo/00000000006
      {"id":"FailoverDemo","container":"abc","uuid":"d8b99f8a-0cb2-4e11-b8fb-e91bcf5ee1ba","consumer":"jetty:http://127.0.0.1:8189/master","started":true}
      JBossFuse:karaf@root> zk:get /fabric/registry/clusters/camel/master/FailoverDemo/00000000007
      {"id":"FailoverDemo","container":"pqr","uuid":"8b4f5a26-f791-426c-84cc-ec9649279903","consumer":"jetty:http://127.0.0.1:8189/master","started":false}
      JBossFuse:karaf@root>
      

      Attachments

        Activity

          People

            ggrzybek Grzegorz Grzybek
            rhn-support-cpandey Chandra Shekhar Pandey (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: