Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-957

Intermittent cluster stability issues

    XMLWordPrintable

Details

    • Bug
    • Resolution: Obsolete
    • Major
    • 2.8
    • 2.7
    • None

    Description

      We are using jgroups as a notification system between webapps running inside tomcat or weblogic server. In our current test platform all cluster nodes are on the same host, most of them on the same container (tomcat). Some web-applictions may have several connections to the cluster.
      We use UDP multicast on a LAN, the configuration is nearly the default one.

      The system seems to work fine but regularly we have cluster stability issues. Typically lot of SUSPECT messages are exchanged, a lot of "GMS: address ..." items are logged on standard output, the number of view accepted events dramatically increases.

      As an example, looking at the number of viewaccepted (grep -c viewAccepted */logout.log):
      logout.log.2009-03-25:6
      logout.log.2009-03-26:51
      logout.log.2009-03-27:49
      logout.log.2009-03-28:0
      logout.log.2009-03-29:2290
      logout.log.2009-03-30:64
      logout.log.2009-03-31:55
      logout.log.2009-04-01:15
      logout.log.2009-04-02:433
      logout.log.2009-04-03:32
      logout.log.2009-04-04:4
      logout.log.2009-04-05:5
      logout.log.2009-04-06:38
      logout.log.2009-04-07:26
      logout.log.2009-04-08:30
      logout.log.2009-04-09:19
      logout.log.2009-04-10:32
      logout.log.2009-04-11:5
      logout.log.2009-04-12:7
      logout.log.2009-04-13:2236
      logout.log.2009-04-14:56

      We performed several test campaigns sending and receiving messages during a 2 or 3 dyas period and checking for message loss but everything went right. Until the problems appears again. No network issue was detected by our system administrator.

      Another typical problem is that members send NOT_MEMBER messages causing stacks to shutdown (should I say channels to close?). [ Received NOT_MEMBER event from null I'm being shunned; exiting]. The shun option is not set (neither Channel with auto-reconnect option set) and nevertheless in some cases the stack starts up again (CloserThread - reconnecting to group ...)and in other cases not. Please note that when the stack does not start up automatically, it is impossible to connect to the channel manually (we always receive ChannelClosedException)

      Typically
      [sip@bipro tmusadmin]$ grep -c NOT_MEMBER jgroup.log*
      jgroup.log:0
      jgroup.log.2009-03-30:3
      jgroup.log.2009-03-31:0
      jgroup.log.2009-04-01:0
      jgroup.log.2009-04-02:1370
      jgroup.log.2009-04-07:0
      jgroup.log.2009-04-10:0
      jgroup.log.2009-04-11:11
      jgroup.log.2009-04-12:9
      jgroup.log.2009-04-13:587
      jgroup.log.2009-04-14:0

      A suggestion would be greatly appreciated.

      Sory for the size of the logs!

      Attachments

        Activity

          People

            rhn-engineering-bban Bela Ban
            ac1789 a C (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: