Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-423

SEQUENCER changes

    XMLWordPrintable

Details

    • Task
    • Resolution: Won't Do
    • Major
    • 2.5
    • None
    • None

    Description

      [email from Matthieu Bentot]

      Hi,

      Sorry for the late answer.

      Bela Ban wrote:

      > Hi Matthieu,
      >
      > Matthieu Bentot wrote:
      > > Hi all,
      > >
      > > First off, I wanted to thank everybody for the great job on
      > > JGroups, which has been very useful to us.
      > >
      > > Attached are a few changes we've made (all up to date as of 2.4): -
      > > TAG (new) is a very simple layer that adds a tag to messages going
      > > down the stack and prints them on the way up. I found it useful for
      > > debugging purposes.
      >
      > I'll add it if you add javadoc that explains what it does
      >
      > > - IDLE (new) simply broadcasts a message with a special tag if
      > > nothing has been sent in a configurable interval. We had an issue
      > > with NAKACK where a node would be waiting for an answer to a
      > > message while the recipient was waiting for the end of a message.
      > > This may have been resolved since then
      >
      > I don't see the point of this. If you discovered a bug, we should fix
      > ths bug and not add a new protocol. Maybe you ran into the "Last
      > Message dropped in NAKACK" issue (JGroups/doc/design/varia2.txt)
      > issue ? This is a matter of changing the timeout in STABLE.

      You're right of course. It was more of an issue of nodes hanging for a few seconds every so often, really, and that was back in th 2.2 days.
      I'll try without it with a newer version and to change the timouts.

      > > - SEQUENCER (update) had a java synchronization issue that was
      > > causing a deadlock when using a threadless stack (like we do).
      >
      > Can you send me a patch against CVS head ? You changed the format, so
      > I couldn't really see what the changes were all about...
      >
      > > Finally, I added a way to bypass ordering, in the form of an empty
      > > internal header (SEQUENCER.UNORDERED_HEADER). By adding this header
      > > to a message, an application can tell the SEQUENCER layer that the
      > > message is not subject to ordering.
      >
      > I'd prefer you to use Message.OOB (msg.setFlag(Message.OOB),
      > msg.isFlagSet(Message.OOB)), that's the right mechanism to bypass
      > ordering on a per-message basis.
      >
      > > This is useful to us because (part of) our application is a
      > > distributed persistent transactional database; most of the traffic
      > > is protected by transactions, for which we can avoid the
      > > performance penalty using this mechanism (I was originally using
      > > CAUSAL for that reason, even though a total ordering was the
      > > correct choice).
      >
      > Yes, I agree this is useful, but the impl should use the OOB flag
      > rather than adding a header. Can you change this and re-submit ?

      Ah, yes, that's much better than the dodgy hack . I'm a bit unclear about whether the layer should clear the OOB flag or not before passing down. This version does.
      I'm not sure a diff will help because I had to change the structure a bit to fix the synchronization issue.
      The other change I made was to stop sending on a view change until the new coordinator signals that all the members are in the view. This avoids losing messages on startup when nodes join too fast, and then some messages get discarded because the nodes can't decide whether they belong to a view they were in.
      I also took the opportunity to parameterize the collections. The attached diff is against cvs current.
      I tried it a few times and it seems to work fine.

      > > There is another change that will be coming soon (as soon as I get
      > > it ported to 2.4, unless you want it against 2.3SP1 , but it's a
      > > bit problematic. Simply put, it allows JGroups to use pools for use
      > > messages and message fragments.
      >
      > The experts say we don't need pools anymore, especially in JDK5 where
      > the VM maintains internal pools anyway. What's your motivation ?
      > Performance ? Did you mesure a performance increase using pools for
      > messages ?

      Yes, I know. However, AFAIK, the experts say that because, with modern GCs, allocation and deallocation are faster on the heap (in the new generation), with the drawback of having a slight pause every so often which is not noticeable anyway in the case of webapps (and most desktop apps). Also pools tend to hog the tenured space even when they're not actually used.
      I think the situation is bit different in this case because:

      • the pool is constantly in use;
      • we're allocating arrays, for which allocation from a pool is much faster, since it doesn't need to initialize them to 0s;
      • GC pauses are not negligible, because (if the app is high throughput or low-latency) it can cause the other nodes to hold too. So if you pause, say, .5s every 10s, and you got 4 nodes, the cluster can be idling a fifth of the time. If have no real numbers here, and of course good multithreading can help, but you get the idea.

      Anyway, I ran the JGroups perf bench to compare. 4 nodes transmitting 2GB on 4*Sun v40z, each with 4 opterons, on a dedicated Gig-E LAN (only traffic is JGroups). Sun 1.5.0_10-b03 JVM with 1GB of memory. I attached the non-pooled config.
      I get the following results (that's the average of the 3 medians of 5 runs). Keep in mind that this is 2.3SP1.

      size no pool pool
      1k 16.2MB/s 15.4MB/s 10k 23.3MB/s 28.8MB/s 100k 26.6MB/s 41.6MB/s 1M 24.9MB/s 42.8MB/s
      It's worth noting that this config is quite ideal for the GC (few long time references to move out of the new generation, parallel GC on a separate proc).
      The 1k test uses no pooling (the messages are not fragmented).
      The 10k one only uses the fragment pool (explaining the lower gain). If the message pool threshold is lowered to include them, it's about 32MB/s.
      As I mentioned, because of the Message API, it is required to maintain 2 pools, one for fragments, one for messages. This is ungainly and inefficient. If Message didn't expose its internal array, the message pool could go away, and pooling would be more efficient for small messages. It would also save one extra copy.

      Oh, yeah, cvs seems to be missing the following annotations: org.jgroups.annotations.GuardedBy and import org.jgroups.annotations.Immutable.

      Regards,

      Matthieu

      Attachments

        Activity

          People

            rhn-engineering-bban Bela Ban
            rhn-engineering-bban Bela Ban
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: