Loading...

Type: Outcome
Resolution: Won't Do
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: API & Datastore, Edge, etcd, Install, SNO
Labels:
- SNO
- disaster-recovery

Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Size:
L
RICE Score:
0

Feature Overview:

Ability to reconfigure the core openshift constructs after cluster has been deployed. This should address various scenarios:

Pre-installation of OCP as appliance on a pre-staging process and then have the ability to deploy and reconfigure those clusters into a different location
Pre-installation of OCP clusters on Cloud provider, keep them in standby until a request for new cluster is received. At that point an existing standby cluster is reconfigured to serve the request without having to do a new installation
On disaster recovery scenarios where a cluster is restore from backups on a DR location different from the primary location, the cluster needs to be reconfigured to operate on the new network

This capability should apply to:

SNO: background https://issues.redhat.com/browse/TELCOSTRAT-38
- With workers
Multi-node OCP (future) with HA Control Plane

Goals:

Architectural design for consistent OCP reconfiguration
All OCP core cluster operators should support reconfiguration for the use cases described in this card

Requirements:

The SNO relocation requires the ability to modify different configurations in the node and the cluster:

Support changing the hostname
Support setting DNS server
Support changing the cluster name
Support changing the cluster domain
Support changing the cluster ID
Support OCP relocation to a different network (change host IP)
Support changing OCP DNS
Certificate rotation (should align with) https://issues.redhat.com/browse/OCPSTRAT-714
Must maintain an auditable history of reconfigurations
~~Extend the initial kubelet and node cert validity to 30 days (maybe longer)~~
Factory SNO"
- Minimize deployment time: The deployment time at the far-edge site should be in the order of minutes, ideally less than 20 minutes.
- Validation before shipment: The solution should allow partners and customers to validate each installed product before shipping it to the far edge, where it is costly to experience errors.
- Simplify SNO deployment at the far edge: Non-technical operators should be able to reconfigure SNO at deployment time.

Use Cases:

Recovery of OCP clusters on a disaster recovery scenario where the cluster is restored from backups to a DR location where it is not possible to operate using the same identity from the main locations
Ability to pre-provision clusters on Cloud or virtualization environments and keep them as "standby clusters" until it is required to go immediately in use at which point it is reconfigured as a day-2 operation completely eliminating the need for any type of installation of platform and other workload.
Ability to create appliances with OCP which are then reconfigured as day-2 operation when arriving to their destination
Ability to relocate cluster across domains or name schemes
Ability for Telecommunication providers and large scale industrial deployments to follow a process where OCP is pre-installed from factory or on a staging facilities including all their specialized software stack on top of OCP, and have the ability to ship those pre-installed clusters (SNO, compact, multi-node) to the final locations and have them running with the site specific information by a day-2 reconfiguration.

OpenShift (SNO) reconfiguration
This capability is critical for fast deployment at the edge and for validating a complete solution before shipping to the edge.

Upon deployment at the edge site, the SNO should allow reconfiguring specific cluster attributes for OCP to function correctly at the edge site.

The provisioning and reconfiguration flow on TME:

Telecommunication providers have existing Service Depots where they currently prepare SW/HW prior to shipping servers to Far Edge sites. pre-installing SNO onto servers in these facilities enables them to validate and update servers in these pre-installed server pools, as needed.

Telecommunications Service Provider Technicians will be rolling out single node Openshift with a vDU configuration to new Far Edge sites. They will be working from a service depot where they will pre-install a set of Far Edge servers to be deployed at a later date. When ready for deployment, a technician will take one of these generic-OCP servers to a Far Edge site, enter the site-specific information, wait for confirmation that the vDU is in-service/online, and then move on to deploy another server to a different Far Edge site.

Multinode reconfiguration (disaster recovery)

An organization that requires by regulation or policy to maintain a DR process needs the ability to restore the OCP cluster on different locations (DR sites) which do not have the same network attributes (e.g. domain, IP scheme, etc) that the primary location. These organizations require a way to reconfigure the cluster to run in their DR sites.

Questions to Answer (Optional):

Q: How to enable these changes at the far edge? see doc

Q: For each type of change (e.g. changing the cluster name or changing the IP of a control plane node), what is the blast radius of the change for OCP core components, Which components are affected and how?

Additional Considerations

SNO Considerations:

Limited CPU and RAM resources, the customers expect the solution to have the same footprint as single node OpenShift, the relocation capability shouldn't require any additional resources.
IPSec Support at Cluster Boot Some far-edge deployments occur on an insecure network and for that reason access to the host’s BMC is not allowed, additionally an IPSec tunnel must be established before any traffic leaves the cluster once its at the Far Edge site. It is not possible to enable IPSec on the BMC NIC and therefore even OpenShift has booted the BMC is still not accessible.
Static network Support- Other far edge deployments occur on environments without DHCP , in these deployments the networking and site-specific configuration can be provided via host’s BMC.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Which other projects and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.

is related to

OCPSTRAT-714 Comprehensive overhaul of handling OCP internal cert & keys

In Progress

relates to

RFE-1791 Post-Install OCP Network Config Reconfiguration

Under Review

OCPSTRAT-712 SNO Upgrades, Backup, Restores and Rollbacks

New

Details

Description

Feature Overview:

Goals:

Requirements:

Use Cases:

Questions to Answer (Optional):

Additional Considerations

Documentation Considerations

Interoperability Considerations

Attachments

Issue Links

Activity

People

Dates

Hide