Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-9908

Cache startup failure with server hinting and insufficient segments

    Details

    • Steps to Reproduce:
      Hide

      1. set segments to 1, and add machine setting to transport.

      <distributed-cache name="default" segments="1" />
      ...
      <stack name="udp">
          <transport type="UDP" socket-binding="jgroups-udp" machine="${jboss.jgroups.transport.machine:machine1}" rack="${jboss.jgroups.transport.rack:rack1}" site="${jboss.jgroups.transport.site:site1}" />
      </stack>
      

      2. startup 3 nodes.
      3. the 3rd node will fail with Replication timeout by state-transfer timeout.

      This log and clustered.xml was attached as log.zip.

      Show
      1. set segments to 1, and add machine setting to transport. <distributed-cache name= " default " segments= "1" /> ... <stack name= "udp" > <transport type= "UDP" socket-binding= "jgroups-udp" machine= "${jboss.jgroups.transport.machine:machine1}" rack= "${jboss.jgroups.transport.rack:rack1}" site= "${jboss.jgroups.transport.site:site1}" /> </stack> 2. startup 3 nodes. 3. the 3rd node will fail with Replication timeout by state-transfer timeout. This log and clustered.xml was attached as log.zip.
    • Affects:
      Release Notes
    • Workaround:
      Workaround Exists
    • Workaround Description:
      Hide

      increasing segment

      Show
      increasing segment

      Description

      When setting small segment to a cache and using server hinting, node can't start with the following error[1].
      It can be reproduced with RHDG 7.2.3 and 7.3 ER2.

      [1]

      ERROR [org.jboss.msc.service.fail] (MSC service thread 1-4) MSC000001: Failed to start service jboss.datagrid-infinispan.clustered.test: org.jboss.msc.service.StartException in service jboss.datagrid-infinispan.clustered.test: Failed to start service
      ...
      Caused by: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.start() throws java.lang.Exception on object of type StateTransferManagerImpl
      ...
      Caused by: org.infinispan.util.concurrent.TimeoutException: Replication timeout for svr01 (flags=0), site-id=site1, rack-id=rack1, machine-id=machine1)
      at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:916)
      ...
      

      For example, 3rd node will fail to start with the following setting in 3 nodes cluster.
      When set the segments to 20 (6.6.2 default), 6th node will fail to start with the above timeout.
      Nodes seems to not be able to finish the initial state transfer and start up fails if the segments are set insufficiently against the number of nodes,

      <distributed-cache name="default" segments="1" />
      ...
      <stack name="udp">
          <transport type="UDP" socket-binding="jgroups-udp" machine="${jboss.jgroups.transport.machine:machine1}" rack="${jboss.jgroups.transport.rack:rack1}" site="${jboss.jgroups.transport.site:site1}" />
      </stack>
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  dan.berindei Dan Berindei
                  Reporter:
                  hiroki.daicho Hiroki Daicho
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  3 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: