Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-7800

Cluster always in Degraded Mode

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Duplicate Issue
    • Affects Version/s: 8.2.6.Final, 9.0.0.Final
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Scenario:

      • 3 nodes, server mode with Partition handling enabled
      • 2 nodes are killed and bring back online
      • the nodes are unable to merge and the cluster remains in degraded mode.

      I suspect that the FORK channel/protocol is the culprit since the heartbeat command is never handled in the joiner node, but the coordinator receives a CacheNotFoundResponse quickly (i.e. without timeout). The request is received and "delivered" but never reaches Infinispan.

      When starting node 1 (logs from coordinator):

      Received new cluster view: 5, isCoordinator = true, old status = COORDINATOR
      Received new cluster view: 5, isCoordinator = true, old status = COORDINATOR
      //hearbeat sent, ClusterTopologyManagerImpl.confirmMembersAvailable();
      Responses: value=CacheNotFoundResponse, received=true, suspected=false
      Node node01-47572 left while updating cache members
      //the view is not handled
      

      When I started node 2:

      Received new cluster view: 6, isCoordinator = true, old status = COORDINATOR
      Updating cluster members for all the caches. New list is [node03-48579, node01-47572, node02-32959]
      //hearbeat sent, ClusterTopologyManagerImpl.confirmMembersAvailable();
      Responses: Responses{
        node01-47572: value=SuccessfulResponse{responseValue=true} , received=true, suspected=false
        node02-32959: value=CacheNotFoundResponse, received=true, suspected=false}
      Node node02-32959 left while updating cache members
      //the view is not handled
      

      It is always reproducible. The configuration is

      <replicated-cache name="default" mode="SYNC" batching="true">
        <partition-handling enabled="true"/>
        <locking isolation="REPEATABLE_READ"/>
      <state-transfer enabled="false"/>
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  Unassigned
                  Reporter:
                  pruivo Pedro Ruivo
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  3 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: