Details
-
Bug
-
Resolution: Won't Do
-
Blocker
-
None
-
1.3.3.Final
-
None
Description
Setup
- 3 tomcats
- 2 load balancing groups
- 1 request every 3 seconds (no load at all)
- shutdown and kill of various nodes
- no later than third kill/start iteration causes SIGSEGV
SIGSEGV
#if AP_MODULE_MAGIC_AT_LEAST(20101223,1) /* Here that is tricky the worker needs shared memory but we don't and CONFIG will reset it */ helper->index = 0; /* mark it removed */ worker->s = helper->shared; crash---> memcpy(worker->s, stat, sizeof(proxy_worker_shared)); #else worker->id = 0; /* mark it removed */ #endif
Behavior
957 helper = (proxy_cluster_helper *) worker->context; 961 if (helper) { 962 i = helper->count_active; 963 } 968 if (i == 0) { 971 proxy_worker_shared *stat = worker->s; 972 proxy_cluster_helper *helper = (proxy_cluster_helper *) worker->context;
At this point, helper->shared points to a proxy_worker_shared structure that appears to be properly filled.
999 if (worker->cp->pool) {
1000 apr_pool_destroy(worker->cp->pool);
1001 worker->cp->pool = NULL;
1002 }
Regardless of the aforementioned block being there or nor (stuffed after 1010),
helper->shared suddenly points to NULL.
1008 helper->index = 0; 1009 worker->s = helper->shared;
Above assignment makes worker->s pointing to NULL.
1010 memcpy(worker->s, stat, sizeof(proxy_worker_shared));
And here we go
IMHO, other thread already cleared that memory and nulled the pointer, because it absolutely doesn't happen if
I run 1 process and 1 thread.
The workaround that prevents the core looks like this:
if (helper->shared) {
worker->s = helper->shared;
memcpy(worker->s, stat, sizeof(proxy_worker_shared));
}
How do we fix it?
Any ideas? rhn-engineering-jclere