Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1362

NAKACK: second line of defense for requested retransmissions that are not found

    XMLWordPrintable

Details

    • Enhancement
    • Resolution: Cannot Reproduce
    • Major
    • 3.6.8
    • None
    • None

    Description

      When the original sender B is asked by A to retransmit message M, but doesn't have M in its retransmission table anymore, it should tell A, or else A will send retransmission requests to B until A or B leave.

      This problem should have been fixed by JGRP-1251, but if it turns out it wasn't, then this JIRA is (1) a second line of defense to stop the endless retransmission requests and (2) will give us valuable diagnostic information to fix the underlying problem (should there still be one).

      Problem:

      • A has a NakReceiverWindow (NRW) of 50 (highest_delivered seqno) for B
      • B's NRW, however, is 200. B garbage collected messages up to 150.
      • When B sends message 201, A will ask B for retransmission of [51-200]
      • B will retransmit messages [150-200], but it cannot send messages 51-149, as it doesn't have them anymore !
      • A will add messages [150-200], but its NRW is still 50 (highest_delivered)
      • A will continue asking B for messages [51-149] (it does have [150-201])
      • This will go on forever, or until B or A leaves

      SOLUTION:

      • When the original sender B of message M receives a retransmission request for M (from A), and it doesn't have M in its retransmission table, it should send back a MSG_NOT_FOUND message to A including B's digest
      • When A receives the MSG_NOT_FOUND message, it does the following:
      • It logs it own NRW for B
      • It logs B's digest
      • It logs its digest history
        (This information is valuable for investigating the underlying issue)
      • Then A's NRW for B is adjusted:
      • The highest_delivered seqno is set to B.digest.highest_delivered
      • All messages in xmit_table below B.digest.highest_delivered are removed
      • All retransmission tasks in the retransmitter <= B.digest.highest_delivered are cancelled and removed
        (This will stop the retransmission)

      Again, this is a second line of defense, which should never be used. If the underlying problem does occur, however, we'll have valuable information in the logs to diagnose what went wrong.

      Attachments

        Activity

          People

            rhn-engineering-bban Bela Ban
            rhn-engineering-bban Bela Ban
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: