Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1457

TimeScheduler2 loses tasks

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 3.0.10, 3.1
    • 3.0.9
    • None

    Description

      The symptoms I sometime see are: broadcast messages not being delivered to a member.

      I've tracked this down to being because NAKACK2 has gaps in its record of sequence numbers, and its RetransmitTask is not running. I've confirmed that the task is not running by calling stack.getTransport().dumpTimerTasks() and seeing that it is not among the scheduled tasks.

      So far, so definite. I also have a theory about how this happens.

      Suppose thread 1 is in TimeScheduler2._run(), and has got as far as executing some tasks but has not yet reached the line tasks.keySet().removeAll(keys).

      Meanwhile, suppose thread 2 is in TimeScheduler2.schedule(), adding a task that has the same key as the just-executed task. It can reach the branch task.remove(key) ("// entry has completed; remove it"), go round the loop again, and successfully call tasks.putIfAbsent(key, task).

      Now thread 1 picks up again, calls removeAll(keys), and removes the task that has just been scheduled. Oops.

      I suggest that a likely fix is to delete the "else tasks.remove(key)" branch from schedule() altogether. (If we're in that branch then we're blocked by a completed entry. That entry will be removed shortly by the run() thread, and then we'll be able to progress).

      Attachments

        Activity

          People

            rhn-engineering-bban Bela Ban
            dimbleby David Hotham (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: