Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1449

SEQUENCER race leads to lost messages

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 3.1
    • 3.0.9
    • None

      I'm seeing an issue where SEQUENCER is dropping messages, with logs like this:

      2012-04-13 10:05:21.363 [Incoming-1,pisces,CFS-B-pisces-cfs02] ERROR org.jgroups.protocols.SEQUENCER - CFS-B-pisces-cfs02: non-coord; dropping FORWARD request from CFS-A-pisces-cfs03

      It looks as though there's some sort of race where:

      • a member ("CFS-B-pisces-cfs02") is becoming coordinator, but SEQUENCER hasn't yet seen the view change
      • meanwhile some other member ("CFS-A-pisces-cfs03") has been told of the view change, and so SEQUENCER there forwards messages to the new coordinator
      • but because the new coordinator doesn't yet know that it is coordinator, we hit the problem above.

      The messages don't ever get retransmitted; so they're simply lost.

      Here's some trace from the member who drops the message, with the line from my application showing that he does indeed become coordinator a few milliseconds later:

      2012-04-13 10:05:21.363 [Incoming-1,pisces,CFS-B-pisces-cfs02] ERROR org.jgroups.protocols.SEQUENCER - CFS-B-pisces-cfs02: non-coord; dropping FORWARD request from CFS-A-pisces-cfs03
      2012-04-13 10:05:21.393 [Incoming-2,pisces,CFS-B-pisces-cfs02] INFO c.m.c.CommunicatorComponent$Communicator - New view: MergeView::[CFS-B-pisces-cfs02|86] [CFS-B-pisces-cfs02, CFS-B-pisces-cfs03, CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], subgroups=[CFS-A-pisces-cfs03|84] [CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], [CFS-B-pisces-cfs03|85] [CFS-B-pisces-cfs03, CFS-A-pisces-cfs02, CFS-B-pisces-cfs02]

      And here's trace from the other end, showing that message being broadcast in the new view.

      2012-04-13 10:05:21.359 [Incoming-1,pisces,CFS-A-pisces-cfs03] INFO c.m.c.CommunicatorComponent$Communicator - New view: MergeView::[CFS-B-pisces-cfs02|86] [CFS-B-pisces-cfs02, CFS-B-pisces-cfs03, CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], subgroups=[CFS-A-pisces-cfs03|84] [CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], [CFS-B-pisces-cfs03|85] [CFS-B-pisces-cfs03, CFS-A-pisces-cfs02, CFS-B-pisces-cfs02]
      2012-04-13 10:05:21.361 [ForkJoinPool-1-worker-0] INFO c.m.c.CommunicatorComponent$Communicator - Broadcasting ClusterMgmtMsg

      I can try to get lower level trace from JGroups if that would help.

      I'm using the same stack as in JGRP-1443.

            rhn-engineering-bban Bela Ban
            dimbleby David Hotham (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: