Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-9908

Cache startup failure with server hinting and insufficient segments

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 9.4.9.Final, 10.0.0.Beta3
    • 9.4.6.Final
    • Core
    • None
    • Hide

      1. set segments to 1, and add machine setting to transport.

      <distributed-cache name="default" segments="1" />
      ...
      <stack name="udp">
          <transport type="UDP" socket-binding="jgroups-udp" machine="${jboss.jgroups.transport.machine:machine1}" rack="${jboss.jgroups.transport.rack:rack1}" site="${jboss.jgroups.transport.site:site1}" />
      </stack>
      

      2. startup 3 nodes.
      3. the 3rd node will fail with Replication timeout by state-transfer timeout.

      This log and clustered.xml was attached as log.zip.

      Show
      1. set segments to 1, and add machine setting to transport. <distributed-cache name= " default " segments= "1" /> ... <stack name= "udp" > <transport type= "UDP" socket-binding= "jgroups-udp" machine= "${jboss.jgroups.transport.machine:machine1}" rack= "${jboss.jgroups.transport.rack:rack1}" site= "${jboss.jgroups.transport.site:site1}" /> </stack> 2. startup 3 nodes. 3. the 3rd node will fail with Replication timeout by state-transfer timeout. This log and clustered.xml was attached as log.zip.
    • Release Notes
    • Workaround Exists
    • Hide

      increasing segment

      Show
      increasing segment

    Description

      When setting small segment to a cache and using server hinting, node can't start with the following error[1].
      It can be reproduced with RHDG 7.2.3 and 7.3 ER2.

      [1]

      ERROR [org.jboss.msc.service.fail] (MSC service thread 1-4) MSC000001: Failed to start service jboss.datagrid-infinispan.clustered.test: org.jboss.msc.service.StartException in service jboss.datagrid-infinispan.clustered.test: Failed to start service
      ...
      Caused by: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.start() throws java.lang.Exception on object of type StateTransferManagerImpl
      ...
      Caused by: org.infinispan.util.concurrent.TimeoutException: Replication timeout for svr01 (flags=0), site-id=site1, rack-id=rack1, machine-id=machine1)
      at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:916)
      ...
      

      For example, 3rd node will fail to start with the following setting in 3 nodes cluster.
      When set the segments to 20 (6.6.2 default), 6th node will fail to start with the above timeout.
      Nodes seems to not be able to finish the initial state transfer and start up fails if the segments are set insufficiently against the number of nodes,

      <distributed-cache name="default" segments="1" />
      ...
      <stack name="udp">
          <transport type="UDP" socket-binding="jgroups-udp" machine="${jboss.jgroups.transport.machine:machine1}" rack="${jboss.jgroups.transport.rack:rack1}" site="${jboss.jgroups.transport.site:site1}" />
      </stack>
      

      Attachments

        Issue Links

          Activity

            People

              dberinde@redhat.com Dan Berindei (Inactive)
              rhn-support-hdaicho Hiroki Daicho (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: