Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-19929

Agent-based hosted cluster in disconnected env is degraded as deployments have unavailable replicas

XMLWordPrintable

    • Critical
    • No
    • Hypershift Sprint 243
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Installed agent-based hosted cluster in disconnected environment. The HostedCluster's conditions show that it is Available and Progressing but it is Degraded:
      
          Last Transition Time:  2023-09-28T20:11:35Z
          Message:               [certified-operators-catalog deployment has 1 unavailable replicas, community-operators-catalog deployment has 1 unavailable replicas, redhat-marketplace-catalog deployment has 1 unavailable replicas, redhat-operators-catalog deployment has 1 unavailable replicas]
          Observed Generation:   2
          Reason:                UnavailableReplicas
          Status:                True
          Type:                  Degraded
      
      
      
      The pods of such deployments are in ImagePullBackOff, e.g.
      
      $ oc describe po -n clusters-hosted-0 certified-operators-catalog-df7997697-5sv67
      Name:                 certified-operators-catalog-df7997697-5sv67
      Namespace:            clusters-hosted-0
      Priority:             100000000
      Priority Class Name:  hypershift-control-plane
      Service Account:      default
      Node:                 master-0-2/192.168.123.70
      Start Time:           Thu, 28 Sep 2023 16:13:55 -0400
      Labels:               app=certified-operators-catalog
                            hypershift.openshift.io/control-plane-component=certified-operators-catalog
                            hypershift.openshift.io/hosted-control-plane=clusters-hosted-0
                            olm.catalogSource=certified-operators
                            pod-template-hash=df7997697
      Annotations:          alpha.image.policy.openshift.io/resolve-names: *
                            hypershift.openshift.io/release-image:
                              registry.ocp-edge-cluster-0.qe.lab.redhat.com:5000/ocp/release@sha256:9cdd3d0a1bbe04aecbe19e9f0416114835d317a3e96926884fc49ce899e46306
                            k8s.ovn.org/pod-networks:
                              {"default":{"ip_addresses":["10.130.0.168/23"],"mac_address":"0a:58:0a:82:00:a8","gateway_ips":["10.130.0.1"],"routes":[{"dest":"10.128.0....
                            k8s.v1.cni.cncf.io/network-status:
                              [{
                                  "name": "ovn-kubernetes",
                                  "interface": "eth0",
                                  "ips": [
                                      "10.130.0.168"
                                  ],
                                  "mac": "0a:58:0a:82:00:a8",
                                  "default": true,
                                  "dns": {}
                              }]
                            openshift.io/scc: restricted-v2
                            seccomp.security.alpha.kubernetes.io/pod: runtime/default
      Status:               Pending
      SeccompProfile:       RuntimeDefault
      IP:                   10.130.0.168
      IPs:
        IP:           10.130.0.168
      Controlled By:  ReplicaSet/certified-operators-catalog-df7997697
      Containers:
        registry:
          Container ID:   
          Image:          from:imagestream
          Image ID:       
          Port:           50051/TCP
          Host Port:      0/TCP
          State:          Waiting
            Reason:       ImagePullBackOff
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:        10m
            memory:     160Mi
          Liveness:     exec [grpc_health_probe -addr=:50051] delay=10s timeout=1s period=10s #success=1 #failure=3
          Readiness:    exec [grpc_health_probe -addr=:50051] delay=5s timeout=5s period=10s #success=1 #failure=3
          Startup:      exec [grpc_health_probe -addr=:50051] delay=0s timeout=1s period=10s #success=1 #failure=15
          Environment:  <none>
          Mounts:       <none>
      Conditions:
        Type              Status
        Initialized       True 
        Ready             False 
        ContainersReady   False 
        PodScheduled      True 
      Volumes:            <none>
      QoS Class:          Burstable
      Node-Selectors:     <none>
      Tolerations:        hypershift.openshift.io/cluster=clusters-hosted-0:NoSchedule
                          hypershift.openshift.io/control-plane=true:NoSchedule
                          node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                          node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                          node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Events:
        Type     Reason   Age                     From     Message
        ----     ------   ----                    ----     -------
        Warning  Failed   55m (x47 over 11h)      kubelet  Failed to pull image "from:imagestream": rpc error: code = DeadlineExceeded desc = pinging container registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": dial tcp 52.1.184.176:443: i/o timeout
        Warning  Failed   34m (x34 over 11h)      kubelet  Failed to pull image "from:imagestream": rpc error: code = DeadlineExceeded desc = pinging container registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": dial tcp 18.215.138.58:443: i/o timeout
        Warning  Failed   19m (x12 over 3h25m)    kubelet  Failed to pull image "from:imagestream": rpc error: code = DeadlineExceeded desc = pinging container registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": dial tcp 34.194.164.123:443: i/o timeout
        Normal   BackOff  5m27s (x2234 over 11h)  kubelet  Back-off pulling image "from:imagestream"
        Normal   Pulling  32s (x105 over 12h)     kubelet  Pulling image "from:imagestream"
       

      Version-Release number of selected component (if applicable):

      4.14.0-rc.2

      How reproducible:

      100%

      Steps to Reproduce:

      1. Install MCE 2.4 and hypershift operator on 4.14 1pv4 disconnected hub cluster
      2. Install 4.14.0-rc.2 agent-based hosted cluster 
      3.
      

      Actual results:

      The HostedCluster is degraded because certified-operators-catalog, community-operators-catalog, redhat-marketplace-catalog, redhat-operators-catalog pods are in ImagePullBackOff

      Expected results:

      The pods are ready and the hosted cluster is not degraded

      Additional info:

       

            jparrill@redhat.com Juan Manuel Parrilla Madrid
            epassaro@redhat.com Elsa Passaro
            Liangquan Li Liangquan Li
            Lubov Shilin, Shelly Miron
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: