Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 2.10
Affects Version/s: 2.8, 2.9
Labels:
None

SFDC Cases Counter:
SFDC Cases Links:

Description

org.jgroups.blocks.LazyRemovalCache used in org.jgroups.protocols.TP removes marked cache items only when it's size exceeds max_elements size, which is set to 20 in TP.

I'm using jgroups (tried 2.8 and 2.9) with jboss-cache 3.2.1, using TCP protocol. I've tried to investigate why when any node leaves the cluster, replication time increases by a second (around 50ms initially).

Here's what I found:

What a node leaves the cluster and view changes:
1. TP calls logical_addr_cache.retainAll(members);
2. LazyRemovalCache.retainAll updates the map, setting removable flag to true on those members that are not in the view.
3. LazyRemovalCache.checkMaxSizeExceeded NEVER removes them from the cache because it's size is always less than max_elements, which is 20.

1. BasicTCP.sendMulticast calls TP.sendToAllPhysicalAddresses
2. TP.sendToAllPhysicalAddresses iterates through all values in logical_addr_cache calling sendUnicast for each
3. logical_addr_cache contains all the nodes including those killed, and tries to connect to each if them, which causes enormous delays

This is causing replication time to increase for connection timeout for every node removed from cluster

Attachments

Issue Links

is related to

JGRP-1147 TP: make logical_addr_cache's timeout configurable

Resolved

Activity

People

Assignee:: Bela Ban

Reporter:: Fedor Cherepanov (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 2010/03/14 8:50 PM

Updated:: 2010/03/31 7:22 AM

Resolved:: 2010/03/31 7:22 AM