Uploaded image for project: 'OPCT - OpenShift Provider Compatibility Tool'
  1. OPCT - OpenShift Provider Compatibility Tool
  2. OPCT-7

[bug][backend] Sonobuoy's aggregator stop working after cluster upgrades

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Blocker Blocker
    • None
    • None
    • Plugins

      DESCRIPTION:

      The sonobuoy aggregator pod is crashing during cluster upgrades (feature SPLAT-651   ).

      The aggregator pod is receiving requests from workers to update the status, but it requires to annotate the pod to perform the action, it's being refused after some time during the cluster upgrade (error message below). It seems the token used to access the kube-api is being expired during upgrade progress.

      Note: the certification pods (sonobuoy) is removed from upgrade lifecycle by paused MCP.

      Steps to reproduce:

      • Apply the fixes on SCC which blocks the upgrade ( SPLAT-874 )
      • Run the OPCT
      • Start the upgrade process on y-stream (updates on z-stream does not crash the sonobuoy token)
      • Checke the sonobuoy aggregator logs, the error below should be on fire:
      time="2022-11-16T19:51:32Z" level=info 
      msg="couldn't annotate sonobuoy pod" 
      error="couldn't patch pod annotation: pods \"sonobuoy\" is forbidden: 
      unable to validate against any security context constraint: [
      provider restricted-v2: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group,
      spec.containers[0].securityContext.runAsUser: Invalid value: 1000: must be in the ranges: [1000650000, 1000659999] 
      provider restricted: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group, 
      provider machine-api-termination-handler: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group, 
      spec.volumes[0]: Invalid value: \"configMap\": configMap volumes are not allowed to be used, 
      spec.volumes[1]: Invalid value: \"configMap\": configMap volumes are not allowed to be used, 
      spec.volumes[2]: Invalid value: \"emptyDir\": emptyDir volumes are not allowed to be used, 
      spec.volumes[3]: Invalid value: \"projected\": projected volumes are not allowed to be used, 
      provider hostnetwork-v2: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group, 
      provider hostnetwork: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group, 
      provider hostaccess: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group
      ]"  

      Then the CLI got stuck and did not update the plugin states:

      [...]
      Wed, 16 Nov 2022 16:51:24 -03> Global Status: running
      JOB_NAME                           | STATUS     | RESULTS    | PROGRESS                  | MESSAGE                                           
      05-openshift-cluster-upgrade       | running    |            | 0/0 (0 failures)          | status=Working towards 4.11.4: 106 of 803 done (13% complete)
      10-openshift-kube-conformance      | running    |            | 0/345 (0 failures)        | status=waiting-for=05-openshift-cluster-upgrade=(0/0/0)=[66/100]
      20-openshift-conformance-validated | running    |            | 0/3251 (0 failures)       | status=blocked-by=10-openshift-kube-conformance=(0/-345/0)=[0/100]
      99-openshift-artifacts-collector   | running    |            | 0/0 (0 failures)          | status=blocked-by=20-openshift-conformance-validated=(0/-3251/0)=[0/100] 
      [...]

       

      Required:

      • PR created fixing the errors on OPCT

      {}Nice to have:{}

      ...

      ACCEPTANCE CRITERIA:

      • The results running the upgrade feature should be accepted
      • Any PR should be merged
      • Any external issues should be addressed

      ENGINEERING DETAILS:

       

            rhn-support-mrbraga Marco Braga
            rhn-support-mrbraga Marco Braga
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: