Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33093

SNO 4.15 upgrade becomes wedged potentially due to manually installed kernel rpms

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Can't Do
    • Icon: Normal Normal
    • None
    • 4.15.z
    • RHCOS
    • Important
    • No
    • 1
    • 253 - Core Packages
    • 1
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          I tried to update my SNO from 4.15.5 to 4.15.11. However the Machine Config Pool became degraded preventing the upgrade from reaching completion. 

      Version-Release number of selected component (if applicable):

          4.15.5 -> 4.15.11

      How reproducible:

          Unknown, customer lab environment, so I have not tried to reproduce.  I would likely just install 4.15.11 directly if I were to reinstall.

      Steps to Reproduce:

          1. Install custom patched kernel (for example: sudo rpm-ostree override replace kernel{,-core,-modules,-modules-extra}-5.14.0-284.59.1.rstat_blkio.el9_2.x86_64.rpm)
          2. Do cluster upgrade: oc adm upgrade --to=4.15.11
          3. Wait for upgrade to show error.
          

      Actual results:

      $ oc adm upgrade
      Failing=True:  Reason: ClusterOperatorDegraded
        Message: Cluster operator machine-config is degradedinfo: An upgrade is in progress. Unable to apply 4.15.11: wait has exceeded 40 minutes for these operators: machine-configUpgradeable=False  Reason: DegradedPool
        Message: Cluster operator machine-config should not be upgraded between minor versions: One or more machine config pools are degraded, please see `oc get mcp` for further details and resolve before upgradingUpstream is unset, so the cluster will use an appropriate default.
      Channel: candidate-4.15 (available channels: candidate-4.15, candidate-4.16)
      No updates available. You may still upgrade to a specific release image with --to-image or wait for new updates to be available
      
      $ oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.15.5    True        True          11h     Unable to apply 4.15.11: wait has exceeded 40 minutes for these operators: machine-config
      
      $ oc get clusteroperator
      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.15.11   True        False         False      11h
      cloud-controller-manager                   4.15.11   True        False         False      32d
      config-operator                            4.15.11   True        False         False      32d
      dns                                        4.15.11   True        False         False      11h
      etcd                                       4.15.11   True        False         False      32d
      image-registry                             4.15.11   True        False         False      11h
      ingress                                    4.15.11   True        False         False      32d
      kube-apiserver                             4.15.11   True        False         False      32d
      kube-controller-manager                    4.15.11   True        False         False      32d
      kube-scheduler                             4.15.11   True        False         False      32d
      kube-storage-version-migrator              4.15.11   True        False         False      6d19h
      machine-approver                           4.15.11   True        False         False      32d
      machine-config                             4.15.5    True        True          True       32d     Unable to apply 4.15.11: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 1, ready 0, updated: 0, unavailable: 0)]]
      marketplace                                4.15.11   True        False         False      32d
      monitoring                                 4.15.11   True        False         False      11h
      network                                    4.15.11   True        False         False      32d
      node-tuning                                4.15.11   True        False         False      11h
      openshift-apiserver                        4.15.11   True        False         False      11h
      openshift-controller-manager               4.15.11   True        False         False      11h
      operator-lifecycle-manager                 4.15.11   True        False         False      32d
      operator-lifecycle-manager-catalog         4.15.11   True        False         False      32d
      operator-lifecycle-manager-packageserver   4.15.11   True        False         False      11h
      service-ca                                 4.15.11   True        False         False      32d
      storage                                    4.15.11   True        False         False      32d
      
      

      Expected results:

          For the upgrade to complete.

      Additional info:

      $ oc get po -nopenshift-machine-config-operator
      NAME                                         READY   STATUS                 RESTARTS   AGE
      kube-rbac-proxy-crio-pstacn1-sut             0/1     CreateContainerError   0          9h
      machine-config-controller-5b49c86f4b-p7lmd   2/2     Running                0          11h
      machine-config-daemon-f79j6                  2/2     Running                0          11h
      machine-config-operator-dbb55f546-6qpcx      2/2     Running                0          11h
      machine-config-server-jcmp9                  1/1     Running                0          11h
      
      $ oc logs machine-config-controller-5b49c86f4b-p7lmd
      E0429 14:34:03.804410       1 render_controller.go:439] Error syncing Generated MCFG: %!w(*errors.StatusError=&{{{ } {   <nil>} Failure Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "worker": the object has been modi
      fied; please apply your changes to the latest version and try again Conflict 0xc0029ccf60 409}})
      E0429 14:34:03.806224       1 render_controller.go:461] Error updating MachineConfigPool worker: Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "worker": the object has been modified; please apply your changes to the
      latest version and try again
      I0429 14:34:03.806232       1 render_controller.go:378] Error syncing machineconfigpool worker: Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "worker": the object has been modified; please apply your changes to the l
      atest version and try again
      E0429 14:34:03.847677       1 render_controller.go:439] Error syncing Generated MCFG: %!w(*errors.StatusError=&{{{ } {   <nil>} Failure Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "master": the object has been modi
      fied; please apply your changes to the latest version and try again Conflict 0xc00086a600 409}})
      E0429 14:34:03.849612       1 render_controller.go:461] Error updating MachineConfigPool master: Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "master": the object has been modified; please apply your changes to the
      latest version and try again
      I0429 14:34:03.849621       1 render_controller.go:378] Error syncing machineconfigpool master: Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "master": the object has been modified; please apply your changes to the l
      atest version and try again
      I0429 14:34:08.767117       1 status.go:207] Pool worker: All nodes are updated with MachineConfig rendered-worker-70802875d6e4d855ddae1368c075d6bb
      I0429 14:34:08.786413       1 status.go:224] Degraded Machine: pstacn1-sut and Degraded Reason: unexpected on-disk state validating against rendered-master-203f42c9596377c3f3ae47f1927c75a3: error running rpm-ostree kargs: exit status 1
      Job for rpm-ostreed.service failed because the control process exited with error code.
      See "systemctl status rpm-ostreed.service" and "journalctl -xeu rpm-ostreed.service" for details.
      error: Loading sysroot: exit status: 1
      
      $ sudo rpm-ostree status
      Job for rpm-ostreed.service failed because the control process exited with error code.
      See "systemctl status rpm-ostreed.service" and "journalctl -xeu rpm-ostreed.service" for details.
      × rpm-ostreed.service - rpm-ostree System Management Daemon
           Loaded: loaded (/usr/lib/systemd/system/rpm-ostreed.service; static)
          Drop-In: /run/systemd/system/rpm-ostreed.service.d
                   └─bug2111817.conf
                   /etc/systemd/system/rpm-ostreed.service.d
                   └─mco-controlplane-nice.conf
           Active: failed (Result: exit-code) since Mon 2024-04-29 14:42:13 UTC; 5ms ago
             Docs: man:rpm-ostree(1)
          Process: 1595530 ExecStart=rpm-ostree start-daemon (code=exited, status=217/USER)
         Main PID: 1595530 (code=exited, status=217/USER)
              CPU: 0Apr 29 14:42:13 pstacn1-sut systemd[1]: Starting rpm-ostree System Management Daemon...
      Apr 29 14:42:13 pstacn1-sut systemd[1]: rpm-ostreed.service: Main process exited, code=exited, status=217/USER
      Apr 29 14:42:13 pstacn1-sut systemd[1]: rpm-ostreed.service: Failed with result 'exit-code'.
      Apr 29 14:42:13 pstacn1-sut systemd[1]: Failed to start rpm-ostree System Management Daemon.
      error: Loading sysroot: exit status: 1
      
      $ oc get mcp
      NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      master   rendered-master-203f42c9596377c3f3ae47f1927c75a3   False     True       True       1              0                   0                     1                      32d
      worker   rendered-worker-70802875d6e4d855ddae1368c075d6bb   True      False      False      0              0                   0                     0                      32d
      
      $ oc get mc
      NAME                                                GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
      00-master                                           6e28938baecfe677ff6b69f46e2c889c1a7a0bb5   3.4.0             32d
      00-worker                                           6e28938baecfe677ff6b69f46e2c889c1a7a0bb5   3.4.0             32d
      01-master-container-runtime                         6e28938baecfe677ff6b69f46e2c889c1a7a0bb5   3.4.0             32d
      01-master-cpu-partitioning                                                                     3.2.0             32d
      01-master-kubelet                                   6e28938baecfe677ff6b69f46e2c889c1a7a0bb5   3.4.0             32d
      01-worker-container-runtime                         6e28938baecfe677ff6b69f46e2c889c1a7a0bb5   3.4.0             32d
      01-worker-cpu-partitioning                                                                     3.2.0             32d
      01-worker-kubelet                                   6e28938baecfe677ff6b69f46e2c889c1a7a0bb5   3.4.0             32d
      02-master-workload-partitioning                                                                3.2.0             32d
      15-master-hosts.yaml                                                                           3.2.0             32d
      30-master-dnsmasq.yaml                                                                         3.2.0             32d
      50-nto-master                                                                                                    26d
      50-performance-n1-master                                                                       3.2.0             26d
      97-master-generated-kubelet                         6e28938baecfe677ff6b69f46e2c889c1a7a0bb5   3.4.0             11h
      97-worker-generated-kubelet                         6e28938baecfe677ff6b69f46e2c889c1a7a0bb5   3.4.0             11h
      98-master-generated-kubelet                         6e28938baecfe677ff6b69f46e2c889c1a7a0bb5   3.4.0             32d
      98-worker-generated-kubelet                         6e28938baecfe677ff6b69f46e2c889c1a7a0bb5   3.4.0             32d
      99-master-etc-udev-rulesd-renic                                                                3.2.0             20d
      99-master-generated-kubelet                         6e28938baecfe677ff6b69f46e2c889c1a7a0bb5   3.4.0             26d
      99-master-generated-registries                      6e28938baecfe677ff6b69f46e2c889c1a7a0bb5   3.4.0             32d
      99-master-oneshot-script-service                                                               3.2.0             26d
      99-master-ssh                                                                                  3.2.0             32d
      99-worker-generated-registries                      6e28938baecfe677ff6b69f46e2c889c1a7a0bb5   3.4.0             32d
      99-worker-ssh                                                                                  3.2.0             32d
      container-mount-namespace-and-kubelet-conf-master                                              3.2.0             32d
      rendered-master-203f42c9596377c3f3ae47f1927c75a3    8437f354d88926efbf447472c640f27cc3764741   3.4.0             11h
      rendered-master-286647e15c358a693c22faee520476c5    8437f354d88926efbf447472c640f27cc3764741   3.4.0             20d
      rendered-master-56d89121965318828618a96b80dc948e    8437f354d88926efbf447472c640f27cc3764741   3.4.0             32d
      rendered-master-5f7b95676f8e6789152ae7b533a12f40    8437f354d88926efbf447472c640f27cc3764741   3.4.0             26d
      rendered-master-68f4353024a25c7106de5bace3e7791f    6e28938baecfe677ff6b69f46e2c889c1a7a0bb5   3.4.0             11h
      rendered-master-826411b944246a00333200aaa4a04924    8437f354d88926efbf447472c640f27cc3764741   3.4.0             6d21h
      rendered-master-8d03c8220f3a19b505d4119606d31945    8437f354d88926efbf447472c640f27cc3764741   3.4.0             6d22h
      rendered-master-92677ee03ce3852593a6567a4755f412    8437f354d88926efbf447472c640f27cc3764741   3.4.0             32d
      rendered-master-a3dec86f75a0bbdbdbbfc3b7169af7b0    8437f354d88926efbf447472c640f27cc3764741   3.4.0             26d
      rendered-master-af809de8e1c649425a147296bd9fcee1    8437f354d88926efbf447472c640f27cc3764741   3.4.0             6d17h
      rendered-master-b0b73e8b86c7434a1e29d063e1ddca5c    8437f354d88926efbf447472c640f27cc3764741   3.4.0             32d
      rendered-master-c7018a68ddd834546f657a54af7526f8    8437f354d88926efbf447472c640f27cc3764741   3.4.0             6d19h
      rendered-master-d0e79cf4e402385678b23f12397fc47f    8437f354d88926efbf447472c640f27cc3764741   3.4.0             26d
      rendered-master-d3ec6bec2b6493bd844f8c36c55bcca6    8437f354d88926efbf447472c640f27cc3764741   3.4.0             32d
      rendered-master-d914d1c82db6323a363e784cbe8f9bbe    8437f354d88926efbf447472c640f27cc3764741   3.4.0             26d
      rendered-master-ec50afe000ad68a263ebc6a5624facf5    8437f354d88926efbf447472c640f27cc3764741   3.4.0             6d21h
      rendered-master-f75e5fe130ea5715ff2583ef2903f144    8437f354d88926efbf447472c640f27cc3764741   3.4.0             26d
      rendered-worker-0d265722591f6031ea09a19f9122222f    8437f354d88926efbf447472c640f27cc3764741   3.4.0             32d
      rendered-worker-3f927333966bd29f4033cd08c984cb04    8437f354d88926efbf447472c640f27cc3764741   3.4.0             26d
      rendered-worker-4dcba50d0390a39a45178ac4216e191c    8437f354d88926efbf447472c640f27cc3764741   3.4.0             32d
      rendered-worker-6f9f51bcc0e7d8e856c2479992d36992    8437f354d88926efbf447472c640f27cc3764741   3.4.0             32d
      rendered-worker-70802875d6e4d855ddae1368c075d6bb    6e28938baecfe677ff6b69f46e2c889c1a7a0bb5   3.4.0             11h

            rhn-support-jmarrero Joseph Marrero Corchado
            sejug2 Sebastian Jug
            Michael Nguyen Michael Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: