Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1846

RELAY2: delay shutting down bridge

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 3.5
    • 3.5
    • None

      A simple test that starts 2 sites x 2 nodes each and shuts them down in order shows a 1 second delay when shutting down the last node in the first site (B):

          public void testCoordinatorShutdown() throws Exception {
             a=createNode(LON, "A", LON_CLUSTER, null);
             b=createNode(LON, "B", LON_CLUSTER, null);
             x=createNode(SFO, "X", SFO_CLUSTER, null);
             y=createNode(SFO, "Y", SFO_CLUSTER, null);
             Util.waitUntilAllChannelsHaveSameSize(10000, 100, a, b);
             Util.waitUntilAllChannelsHaveSameSize(10000, 100, x, y);
             waitForBridgeView(2, 20000, 100, a, x);
      
             a.close();
             Util.waitUntilAllChannelsHaveSameSize(10000, 100, b);
      
             b.close();
             waitForBridgeView(1, 20000, 100, x);
      
             x.close();
      
             y.close();
          }
      

      And the relevant logs:

      13:51:30,017 DEBUG (Timer-2,sfo-cluster,X:) [GMS] _X:sfo: installing view [_A:lon|1] (2) [_A:lon, _X:sfo]
      13:51:30,028 DEBUG (Incoming-2,global,_X:sfo:) [GMS] _X:sfo: installing view [_X:sfo|2] (1) [_X:sfo]
      13:51:30,046 TRACE (Timer-2,lon-cluster,B:) [SHARED_LOOPBACK] _B:lon: sending msg to _X:sfo, src=_B:lon, headers are GMS: GmsHeader[JOIN_REQ]: mbr=_B:lon, UNICAST3: DATA, seqno=1, first, SHARED_LOOPBACK: [cluster_name=global]
      13:51:31,046 TRACE (Timer-2,global,_B:lon:) [SHARED_LOOPBACK] _B:lon: sending msg to _X:sfo, src=_B:lon, headers are GMS: GmsHeader[JOIN_REQ]: mbr=_B:lon, UNICAST3: DATA, seqno=1, first, SHARED_LOOPBACK: [cluster_name=global]
      13:51:31,099 DEBUG (Incoming-2,global,_X:sfo:) [GMS] _X:sfo: installing view [_X:sfo|3] (2) [_X:sfo, _B:lon]
      

      Note that while this happens on a background timer thread, the shutdown is delayed nonetheless because TP.destroy() waits at least 500ms for all the timer threads to finish (TimeScheduler3.stopRunning(). Perhaps that should change as well, so that timer threads are interrupted and finish immediately.

            rhn-engineering-bban Bela Ban
            dberinde@redhat.com Dan Berindei (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: