Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4962

openshift-install agent wait-for install-complete errors out before the cluster installation completes

XMLWordPrintable

    • Moderate
    • Agent Sprint 229, Agent Sprint 230
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      1/9: verified in 4.12, it's in 4.12 GA - adding the GA label back to retain the history
      1/3: Moving this to 4.12 POSTGA for Telco - impact is limited to automation. Needs a release note for 4.12 GA.
      12/20: dependent on the 4.13 version OCPBUGS-3706, which is back to ASSIGNED
      12/15: Green per latest comment, PR is ready to merge, waiting on CI
      12/7: R e d as the previous fix was deemed insufficient and bug status has moved to NEW.
      12/5: G r e e n as the fix is posted and is waiting on a successful CI run.
      11/30: lowered Telco rank/bucket to 3, keeping it on the Telco-Grade OCP 4.12 list due to the potential impact to automation
      11/28: added to the 4.12 gating list
      Rel Note for Telco: Not Required, it's in 4.12 GA
      Show
      1/9: verified in 4.12, it's in 4.12 GA - adding the GA label back to retain the history 1/3: Moving this to 4.12 POSTGA for Telco - impact is limited to automation. Needs a release note for 4.12 GA. 12/20: dependent on the 4.13 version OCPBUGS-3706 , which is back to ASSIGNED 12/15: Green per latest comment, PR is ready to merge, waiting on CI 12/7: R e d as the previous fix was deemed insufficient and bug status has moved to NEW. 12/5: G r e e n as the fix is posted and is waiting on a successful CI run. 11/30: lowered Telco rank/bucket to 3, keeping it on the Telco-Grade OCP 4.12 list due to the potential impact to automation 11/28: added to the 4.12 gating list Rel Note for Telco: Not Required, it's in 4.12 GA

      This is a clone of issue OCPBUGS-3706. The following is the description of the original issue:

      Description of problem:

      While running ./openshift-install agent wait-for install-complete --dir billi --log-level debug on a real bare metal dual stack compact cluster installation it errors out with ERROR Attempted to gather ClusterOperator status after wait failure: Listing ClusterOperator objects: Get "https://api.kni-qe-0.lab.eng.rdu2.redhat.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp [2620:52:0:11c::10]:6443: connect: connection refused but installation is still progressing
      
      DEBUG Uploaded logs for host openshift-master-1 cluster d8b0979d-3d69-4e65-874a-d1f7da79e19e 
      DEBUG Host: openshift-master-1, reached installation stage Rebooting 
      DEBUG Host: openshift-master-1, reached installation stage Configuring 
      DEBUG Host: openshift-master-2, reached installation stage Configuring 
      DEBUG Host: openshift-master-2, reached installation stage Joined 
      DEBUG Host: openshift-master-1, reached installation stage Joined 
      DEBUG Host: openshift-master-0, reached installation stage Waiting for bootkube 
      DEBUG Host openshift-master-1: updated status from installing-in-progress to installed (Done) 
      DEBUG Host: openshift-master-1, reached installation stage Done 
      DEBUG Host openshift-master-2: updated status from installing-in-progress to installed (Done) 
      DEBUG Host: openshift-master-2, reached installation stage Done 
      DEBUG Host: openshift-master-0, reached installation stage Waiting for controller: waiting for controller pod ready event 
      ERROR Attempted to gather ClusterOperator status after wait failure: Listing ClusterOperator objects: Get "https://api.kni-qe-0.lab.eng.rdu2.redhat.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp [2620:52:0:11c::10]:6443: connect: connection refused 
      ERROR Cluster initialization failed because one or more operators are not functioning properly. 
      ERROR 				The cluster should be accessible for troubleshooting as detailed in the documentation linked below, 
      ERROR 				https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html 

      Version-Release number of selected component (if applicable):

      4.12.0-rc.0

      How reproducible:

      100%

      Steps to Reproduce:

      1. ./openshift-install agent create image --dir billi --log-level debug 
      2. mount resulting iso image and reboot nodes via iLO
      3. /openshift-install agent wait-for install-complete --dir billi --log-level debug 

      Actual results:

       ERROR Attempted to gather ClusterOperator status after wait failure: Listing ClusterOperator objects: Get "https://api.kni-qe-0.lab.eng.rdu2.redhat.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp [2620:52:0:11c::10]:6443: connect: connection refused 
      
      cluster installation is not complete and it needs more time to complete 

      Expected results:

      waits until the cluster installation completes

      Additional info:

      The cluster installation eventually completes fine if waiting after the error.
      
      Attaching install-config.yaml and agent-config.yaml

            zabitter Zane Bitter
            openshift-crt-jira-prow OpenShift Prow Bot
            zhenying niu zhenying niu
            Red Hat Employee
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: