Keepalived not bouncing the API VIP when the master goes down - haproxy not passing the traffic to another node


      Description of problem:

      Keepalived on the OCP cluster on Nutanix is not bouncing the API VIP when one of the masters is not responsive.
      We saw a situation when the master is not responsive (SSH was not working either) - but the traffic was down. The haproxy from the nutanix-infra namespace is not passing the traffic to another master and the VIP stayed on the master.
      When we rebooted the host, the system started, but the kubelet was waiting for the CSR.
      The API access was working, but the console (after authentication) was not available (blank page).
      Two issues are raising from this:
      - if a host becomes not responsive, the VIP should be bounced - it wasn't. 
      - after the host is rebooted, but the kubelet was NotReady, the VIP should be bounced - it wasn't.
      Only after the CSR was approved and the Kubelet registered as Ready - then the console access started to work and the cluster was fine. 

      Version-Release number of selected component (if applicable):

          OCP 4.13.18 on Nutanix

      How reproducible:

          I would say it could be reproducible:
          - overload the master node with the VIP (so kube-apiserver response it timing out) and observe the behavior
          - remove the Kubelet client certs and restart the Kubelet without approving the CSRs - observe if the access to API through VIP still works.

      Additional info:

          - will provide data with another comment

