Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-27187

Keepalived not bouncing the API VIP when the master goes down - haproxy not passing the traffic to another node

XMLWordPrintable

    • Important
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Keepalived on the OCP cluster on Nutanix is not bouncing the API VIP when one of the masters is not responsive.
      
      We saw a situation when the master is not responsive (SSH was not working either) - but the traffic was down. The haproxy from the nutanix-infra namespace is not passing the traffic to another master and the VIP stayed on the master.
      
      When we rebooted the host, the system started, but the kubelet was waiting for the CSR.
      The API access was working, but the console (after authentication) was not available (blank page).
      
      Two issues are raising from this:
      
      - if a host becomes not responsive, the VIP should be bounced - it wasn't. 
      - after the host is rebooted, but the kubelet was NotReady, the VIP should be bounced - it wasn't.
      
      Only after the CSR was approved and the Kubelet registered as Ready - then the console access started to work and the cluster was fine. 
      
          

      Version-Release number of selected component (if applicable):

          OCP 4.13.18 on Nutanix
          

      How reproducible:

          I would say it could be reproducible:
          - overload the master node with the VIP (so kube-apiserver response it timing out) and observe the behavior
          - remove the Kubelet client certs and restart the Kubelet without approving the CSRs - observe if the access to API through VIP still works.
          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

      
          

      Expected results:

      
          

      Additional info:

          - will provide data with another comment
          

            bnemec@redhat.com Benjamin Nemec
            rhn-support-vwalek Vladislav Walek
            Zhaohua Sun Zhaohua Sun
            Yanhua Li
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: