Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33796

SNO DU deployment ends up in degraded mcp status after install

XMLWordPrintable

    • Important
    • No
    • MCO Sprint 254
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Observing an intermittent issue with SNO install (with Assisted Installer) with DU profile where the cluster ends up in degraded mcp state post install. Reproduced with both 4.16.0-rc.0 and 4.16.0-rc.1  
      
      $ oc get mcp master                                                                                                            
      NAME     CONFIG   UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE                                                                                                        
      master            False     True       True       1              0                   0                     1                      15h                                                                                        
      
      
      mcp master status:
      
        Conditions:
          Last Transition Time:  2024-05-14T21:21:30Z
          Message:
          Reason:
          Status:                False
          Type:                  Updated
          Last Transition Time:  2024-05-14T21:21:30Z
          Message:               All nodes are updating to MachineConfig rendered-master-3ebeee8538946014a3f107ea0603d260
          Reason:
          Status:                True
          Type:                  Updating
          Last Transition Time:  2024-05-14T21:21:30Z
          Message:               Node e32-h22-r750 is reporting: "missing MachineConfig rendered-master-49651500230308839606552505f7f484\nmachineconfig.machineconfiguration.openshift.io \"rendered-master-49651500230308839606552505f7f484\" not found"
          Reason:                1 nodes are reporting degraded status on sync
          Status:                True
          Type:                  NodeDegraded
          Last Transition Time:  2024-05-14T21:21:30Z
          Message:
          Reason:
          Status:                True
          Type:                  Degraded
          Last Transition Time:  2024-05-14T21:21:35Z
          Message:
          Reason:
          Status:                False
          Type:                  RenderDegraded
        Configuration:
        Degraded Machine Count:     1
        Machine Count:              1
        Observed Generation:        2
        Ready Machine Count:        0
        Unavailable Machine Count:  1
        Updated Machine Count:      0
      Events:                       <none>
      
      
      
      machine-config-daemon pod logs:
      
      [2024-05-14T21:38:13Z INFO  nmstatectl] Nmstate version: 2.2.27
      [2024-05-14T21:38:13Z INFO  nmstatectl::persist_nic] /etc/systemd/network does not exist, no need to clean up
      I0514 21:38:13.197788   50688 daemon.go:1624] In bootstrap mode
      E0514 21:38:13.197828   50688 writer.go:226] Marking Degraded due to: missing MachineConfig rendered-master-49651500230308839606552505f7f484
      machineconfig.machineconfiguration.openshift.io "rendered-master-49651500230308839606552505f7f484" not found
      I0514 21:38:42.173501   50688 certificate_writer.go:340] Certificate was synced from controllerconfig resourceVersion 12044
      I0514 21:38:45.205661   50688 daemon.go:1898] Running: /run/machine-config-daemon-bin/nmstatectl persist-nic-names --root / --kargs-out /tmp/nmstate-kargs1344634730 --cleanup
      
      
      machine-config-daemon pod events:
      Events:
        Type     Reason      Age                  From               Message
        ----     ------      ----                 ----               -------
        Normal   Scheduled   28m                  default-scheduler  Successfully assigned openshift-machine-config-operator/machine-config-daemon-29r45 to e32-h22-r750
        Normal   Pulled      28m                  kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:714a42e9eb52ef1bae8a2575ca1a2bfdf733d5a6786f08ceb3b6ff61d59931cf" already present on machine
        Normal   Created     28m                  kubelet            Created container machine-config-daemon
        Normal   Started     28m                  kubelet            Started container machine-config-daemon
        Normal   Pulled      28m                  kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:91bb4f8991ea4b597c9404cec89a984cc3ad3f76a6099d868bc3388dbbd36346" already present on machine
        Normal   Created     28m                  kubelet            Created container kube-rbac-proxy
        Normal   Started     28m                  kubelet            Started container kube-rbac-proxy
        Normal   Created     26m                  kubelet            Created container machine-config-daemon
        Normal   Pulled      26m                  kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:714a42e9eb52ef1bae8a2575ca1a2bfdf733d5a6786f08ceb3b6ff61d59931cf" already present on machine
        Normal   Started     26m                  kubelet            Started container machine-config-daemon
        Normal   Killing     19m (x2 over 22m)    kubelet            Container machine-config-daemon failed liveness probe, will be restarted
        Normal   Created     19m (x2 over 22m)    kubelet            Created container machine-config-daemon
        Normal   Started     19m (x2 over 22m)    kubelet            Started container machine-config-daemon
        Warning  Unhealthy   16m (x9 over 23m)    kubelet            Liveness probe failed: Get "http://127.0.0.1:8798/health": dial tcp 127.0.0.1:8798: connect: connection refused
        Normal   Pulled      13m (x4 over 22m)    kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:714a42e9eb52ef1bae8a2575ca1a2bfdf733d5a6786f08ceb3b6ff61d59931cf" already present on machine
        Warning  ProbeError  8m5s (x17 over 23m)  kubelet            Liveness probe error: Get "http://127.0.0.1:8798/health": dial tcp 127.0.0.1:8798: connect: connection refused
      body:
        Warning  BackOff  3m28s (x7 over 4m35s)  kubelet  Back-off restarting failed container machine-config-daemon in pod machine-config-daemon-29r45_openshift-machine-config-operator(9953f60a-c482-4ec5-9f3c-d6ac5a874791)
      
      
      oc describe node has the following annotation:
                          machineconfiguration.openshift.io/reason:
                            missing MachineConfig rendered-master-49651500230308839606552505f7f484
                            machineconfig.machineconfiguration.openshift.io "rendered-master-49651500230308839606552505f7f484" not found
      

      Version-Release number of selected component (if applicable):

          OCP 4.16.0-rc.0, 4.16.0-rc.1

      How reproducible:

        1.  Install SNO with DU profile
      2. Check mcp status after install
      
      

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          Master mcp is degraded post install

      Expected results:

           Master mcp should not be degraded post install

      Additional info:

          

            team-mco Team MCO
            nchhabra@redhat.com Noreen Chhabra
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: