Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-836

Improve OpenShift upgrade progress feedback for cluster administrators

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • 50
    • 50% 50%
    • 0
    • 0

      Feature Overview (aka. Goal Summary)  

      As a customer of self managed OpenShift or an SRE managing a fleet of OpenShift clusters I should be able to determine the progress and state of an OCP upgrade and only be alerted if the cluster is unable to progress.

      Goals (aka. expected user outcomes)

      Cluster administrators should be able to monitor the progress of each component during an upgrade through metrics.

      Alerts should only be triggered if the component can not upgrade within a reasonable amount of time.

      Requirements (aka. Acceptance Criteria):

      • Determine a set amount time required to pass before an upgrade triggers a failure
      • Operators should not alert during upgrades unless there is an action required by a cluster administrator
      • Cluster administrators should be able to determine when each component has completed its upgrade
      • All components should emit metrics with regard to its upgrade state

      Notes: https://docs.google.com/document/d/1W90q9lqUinQgUbAOSCXhLHnGHgEARjzl6BBZxxR1Qzo/edit 

      Use Cases (Optional):

      Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

      Questions to Answer (Optional):

      Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

      Out of Scope

      High-level list of items that are out of scope.  Initial completion during Refinement status.

       

      Background

      In OpenShift, you can determine if an update has failed by inspecting the status of the cluster version and the update process. 

      oc get clusterversion | grep -i failing
      1. Check Cluster Version Status
        {{}}
        oc get clusterversion

        {}to see the current version of your cluster and the status of any ongoing update. This command provides a summary of the cluster version and update status.

      1. Detailed Status of the Update
        {{}}
        oc describe clusterversion

        {}This will give you a detailed output, including conditions that indicate the health and status of the update process. Look for conditions like Progressing, Available, or Failing. A Failing status here would indicate that there has been an issue with the update.

      1. Check Cluster Operator Status
        {{}}
        oc get clusteroperator

         This will show you the status of each operator in the cluster. Operators with a status of Degraded or Progressing for an extended period might indicate issues that could impact the update.

       

      Customer Considerations

      Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

       

      Documentation Considerations

      Provide information that needs to be considered and planned so that documentation will meet customer needs.  Initial completion during Refinement status.

       

      Interoperability Considerations

      Which other projects and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

            rh-ee-smodeel Subin MM
            jaharrin James Harrington
            Lalatendu Mohanty, Scott Dodson
            Xiaoli Tian Xiaoli Tian
            Stephanie Stout Stephanie Stout
            Scott Dodson Scott Dodson
            Subin MM Subin MM
            Joseph Caiani Joseph Caiani
            Eric Rich Eric Rich
            Votes:
            1 Vote for this issue
            Watchers:
            19 Start watching this issue

              Created:
              Updated: