Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: 8.0.0.Alpha1
Affects Version/s: 6.0.2.Final, 7.1.1.Final
Component/s: Core
Labels:
None

Git Pull Request:
https://github.com/infinispan/infinispan/pull/3434
Bugzilla References:
https://bugzilla.redhat.com/show_bug.cgi?id=1208429

Description

The join process was designed in the idea that a node would start its caches in sequential order, so ClusterTopologyManager.waitForView() would block at most once for each joining node. However, WildFly actually starts 2 * Runtime.availableProcessors() caches in parallel, and this can be a problem when the machine has a lot of cores and multiple nodes.

ClustertopologyManager.handleClusterView() only updates the viewId after it updated the cache topologies of each cache AND after it confirmed the availability of all the nodes with a POLICY_GET_STATUS RPC. This RPC can block, and it's very easy for the remote-executor thread pool on the coordinator to become overloades with threads like this:

"remote-thread-172" daemon prio=10 tid=0x00007f0cc48c0000 nid=0x28ca4 in Object.wait() [0x00007f0c5f25b000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at org.infinispan.topology.ClusterTopologyManagerImpl.waitForView(ClusterTopologyManagerImpl.java:357)
        - locked <0x00000000ff3bd900> (a java.lang.Object)
        at org.infinispan.topology.ClusterTopologyManagerImpl.handleJoin(ClusterTopologyManagerImpl.java:123)
        at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:162)
        at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:144)
        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$4.run(CommandAwareRpcDispatcher.java:276)

Attachments

Activity

People

Assignee:: Dan Berindei (Inactive)

Reporter:: Dan Berindei (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2015/04/29 10:58 AM

Updated:: 2020/02/07 6:01 AM

Resolved:: 2015/05/07 9:36 AM