Uploaded image for project: 'JBoss Core Services'
  1. JBoss Core Services
  2. JBCS-1100

mod_cluster never removes hung JVM that has requests routed to it

XMLWordPrintable

    • False
    • False
    • +
    • Undefined
    • Hide

      An easy way to recreate such a hung JVM that is never removed is suspending it like so:

      kill -STOP $PID
      
      Show
      An easy way to recreate such a hung JVM that is never removed is suspending it like so: kill -STOP $PID

      If a backend JVM is entirely hung (socket still listening, but no requests ever processed, no STATUS MCMPs ever sent), then mod_cluster does not handle it well currently as traffic is never routed off the bad instance and the bad instance is never removed from the balancer.

      In such a state, requests always persistently timeout, but this doesn't put the balancer member in an error state so requests continue to it. Periodic pings may be attempted and will fail, but that does not stop requests to the problem instance. After 60 ping failures, the node could be removed, but the logic here is problematic as any attempted request (which still times out) results in the failure count being reset:

                  if (elected == oldelected) {
      ...
                  } else
                      ou->mess.num_failure_idle = 0;
      

      So at least any continually failing request attempts should not result in the ping failure count being reset and preventing the node removal. We may also consider preventing any requests to a JVM if its pings are currently failing.

            rhn-engineering-jclere Jean-Frederic Clere
            rhn-support-aogburn Aaron Ogburn
            Paul Lodge Paul Lodge
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: