Uploaded image for project: 'OpenShift Installer'
  1. OpenShift Installer
  2. CORS-1906

Gather bootstrap should capture serial console logs

XMLWordPrintable

    • Gather bootstrap should capture serial console logs
    • False
    • False
    • Done
    • Impediment
    • 0% To Do, 0% In Progress, 100% Done

      OCP/Telco Definition of Done
      Epic Template descriptions and documentation.

      <--- Cut-n-Paste the entire contents of this description into your new Epic --->

      Epic Goal

      • Capture serial console logs from bootstrap and master hosts whenever a cluster fails to bootstrap

      Why is this important?

      • This is a gap in our bootstrap log bundle which would help illustrate why masters fail to join the cluster or the bootstrap host fails to boot and start requisite services, if for instance S3 permissions prevent retrieval of bootstrap.ign.
      • Any future improvement to RHCOS pre-boot diagnostics would likely come in the form of serial console output, though no specific improvements have been identified as of yet.

      Scenarios

      1. Cluster fails to bootstrap
        1. Assuming platform support, automatically attempt to capture serial console logs from relevant machines, placed into standard bootstrap log bundle. If Installer credentials for some reason are not sufficient for console retrieval fail gracefully and emit an error message.
      1. Cluster bootstraps but bootstrap host was preserved and someone runs `openshift-install gather bootstrap`
        1. Assuming platform support, capture serial console logs from relevant machines, placed into standard bootstrap log bundle. If Installer credentials for some reason are not sufficient for console retrieval fail gracefully and emit an error message.

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - This is a rather minor incremental change to the gathered bootstrap log bundle, I'm not sure additional TE is warranted.
      • Documented permissions requirements are amended if gathering this debug information requires permissions not previously documented
      • Once fully tested please backport to 4.N-2

      Dependencies (internal and external)

      1. Platform SDKs must support console retrieval, aws-sdk-go does, see docs reference below

      Previous Work (Optional):

      1. I believe some of the CI jobs have console gathering methods, once we implement this across supported platforms we should probably reduce that duplication.

      Open questions::

      1. Given the primary goal here is for managed services, we need for them to ensure permissions both in STS and non-STS installation modes as well as SSH access to the bootstrap host from Hive which is currently known not to work.

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

            jhixson_redhat John Hixson
            rhn-support-sdodson Scott Dodson
            Yunfei Jiang Yunfei Jiang
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: