Details
-
Bug
-
Resolution: Obsolete
-
Major
-
2.7
-
None
Description
We are using jgroups as a notification system between webapps running inside tomcat or weblogic server. In our current test platform all cluster nodes are on the same host, most of them on the same container (tomcat). Some web-applictions may have several connections to the cluster.
We use UDP multicast on a LAN, the configuration is nearly the default one.
The system seems to work fine but regularly we have cluster stability issues. Typically lot of SUSPECT messages are exchanged, a lot of "GMS: address ..." items are logged on standard output, the number of view accepted events dramatically increases.
As an example, looking at the number of viewaccepted (grep -c viewAccepted */logout.log):
logout.log.2009-03-25:6
logout.log.2009-03-26:51
logout.log.2009-03-27:49
logout.log.2009-03-28:0
logout.log.2009-03-29:2290
logout.log.2009-03-30:64
logout.log.2009-03-31:55
logout.log.2009-04-01:15
logout.log.2009-04-02:433
logout.log.2009-04-03:32
logout.log.2009-04-04:4
logout.log.2009-04-05:5
logout.log.2009-04-06:38
logout.log.2009-04-07:26
logout.log.2009-04-08:30
logout.log.2009-04-09:19
logout.log.2009-04-10:32
logout.log.2009-04-11:5
logout.log.2009-04-12:7
logout.log.2009-04-13:2236
logout.log.2009-04-14:56
We performed several test campaigns sending and receiving messages during a 2 or 3 dyas period and checking for message loss but everything went right. Until the problems appears again. No network issue was detected by our system administrator.
Another typical problem is that members send NOT_MEMBER messages causing stacks to shutdown (should I say channels to close?). [ Received NOT_MEMBER event from null I'm being shunned; exiting]. The shun option is not set (neither Channel with auto-reconnect option set) and nevertheless in some cases the stack starts up again (CloserThread - reconnecting to group ...)and in other cases not. Please note that when the stack does not start up automatically, it is impossible to connect to the channel manually (we always receive ChannelClosedException)
Typically
[sip@bipro tmusadmin]$ grep -c NOT_MEMBER jgroup.log*
jgroup.log:0
jgroup.log.2009-03-30:3
jgroup.log.2009-03-31:0
jgroup.log.2009-04-01:0
jgroup.log.2009-04-02:1370
jgroup.log.2009-04-07:0
jgroup.log.2009-04-10:0
jgroup.log.2009-04-11:11
jgroup.log.2009-04-12:9
jgroup.log.2009-04-13:587
jgroup.log.2009-04-14:0
A suggestion would be greatly appreciated.
Sory for the size of the logs!