Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: httpd 2.4.37 SP8 CR1
Affects Version/s: httpd 2.4.37 SP6 GA
Component/s: mod_cluster, mod_cluster-native, mod_proxy_cluster
Labels:
None

Blocked:
False
Ready:
False
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
QE Test Coverage:
+
Release Note Text:
Undefined
Target Release:

httpd 2.4.37 SP8 GA
Steps to Reproduce:
Hide

An easy way to recreate such a hung JVM that is never removed is suspending it like so:

kill -STOP $PID
Show
An easy way to recreate such a hung JVM that is never removed is suspending it like so: kill -STOP $PID
Market:

SFDC Cases Links:
SFDC Cases Counter:

If a backend JVM is entirely hung (socket still listening, but no requests ever processed, no STATUS MCMPs ever sent), then mod_cluster does not handle it well currently as traffic is never routed off the bad instance and the bad instance is never removed from the balancer.

In such a state, requests always persistently timeout, but this doesn't put the balancer member in an error state so requests continue to it. Periodic pings may be attempted and will fail, but that does not stop requests to the problem instance. After 60 ping failures, the node could be removed, but the logic here is problematic as any attempted request (which still times out) results in the failure count being reset:

            if (elected == oldelected) {
...
            } else
                ou->mess.num_failure_idle = 0;

So at least any continually failing request attempts should not result in the ping failure count being reset and preventing the node removal. We may also consider preventing any requests to a JVM if its pings are currently failing.

is cloned by

MODCLUSTER-732 mod_cluster never removes hung JVM that has requests routed to it

Resolved

Assignee:: Jean-Frederic Clere

Reporter:: Aaron Ogburn

Tester:: Paul Lodge

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2021/04/15 6:33 PM

Updated:: 2024/02/12 6:07 PM

Resolved:: 2021/05/05 9:21 AM

Details

Description

Attachments

Issue Links

Activity

People

Dates