Loading...

XML

Word

Printable

Type: Bug
Resolution: Obsolete
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.7.z, 4.7
Component/s: Performance Addon Operator
Labels:
- Telco
- ocp4
- telco
- telco-5g
- telco-priority-3

Severity:
Important
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Internal Whiteboard:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:
PX Review Complete:

Description of problem:

- UNKNOWN EVENT around 13:24 on December 30th
- System Load begins to increase a little more than normal (from <10 to 30-60)
- OVS has a hiccup likely due to event that caused CPU contention around 13:27 on December 30th
- This increases load further (load now above 300)
- NFS (for PVCs) flakes several times between
- System is rebooted around 18:48 on December 30th

Version-Release number of selected component (if applicable):

4.7.37

How reproducible:

Not very as many clusters have this configuration

Steps to Reproduce:

According to the PerformanceProfile, it should be 8 cores reserved (kubelet and kernelArgs) :

```
   $ oc get performanceprofile perf-profile-2m-worker -o yaml             
   apiVersion: performance.openshift.io/v2
   kind: PerformanceProfile
   metadata:
     annotations:
       kubeletconfig.experimental: |
         {"systemReserved": {"memory": "16Gi", "cpu": "8", "ephemeral-storage": "1Gi"}}
   spec:
     cpu:
       isolated: 2-19,22-39,42-59,62-79
       reserved: 0,1,20,21,40,41,60,61
```

but the MachineConfig rolled out displays 80 cores
```
   $ oc get mc $(oc get mcp workerperf | awk '/workerperf/ {print $2}') -o yaml | yq -e '.spec.kernelArguments'
     "tuned.non_isolcpus=0000ffff,ffffffff,ffffffff",   $ python hexmask_converter.py 0000ffffffffffffffffffff
   0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79
```

I think it should be this
```
   "tuned.non_isolcpus=30000300,00300003"   $ python hexmask_converter.py 3000030000300003
   0,1,20,21,40,41,60,61
```In the previous case (03211783), system processes such as 'ovs-vswitchd' were isolated to 4 specific cores prior to adjustments. These cores were 0,20,40,60.  No processes from containers/pods were to use them. Later in that case, this was adjusted to 8 cores or 0,1,20,21,40,41,60,61.However, based on the above findings, I had Joe Teagno [RH Consultant] login and get me what cores 'ovs-vswitchd' was allowed to use and looks like it's allowed to use any of the 80 cores -- I would expect the affinity list to contain the cores '0,1,20,21,40,41,60,61' but instead it contains '0-79':
```
   [root@worker-01 ~]# hostname --fqdn
   worker-01.kub1.elsstx13.vzwops.com
   
   [root@worker-01 ~]# taskset -c -p $(ps aux | awk '/[o]vs-vswitch/ {print $2}')
   pid 2552's current affinity list: 0-79
```

So if you had a process from a pod/container trying to use the same core(s) that ovs-vswitchd was using and ovs-vswitchd became blocked, this could cause ovs-vswitchd to dynamically throttle itself. This would then lead to pod traffic delayed and higher load and then system instability. Typically manifests in this logging output:
```
   $ less 0010-sosreport-worker-01-2022-12-30-tgkedrp.tar.xz/sosreport-worker-01-2022-12-30-tgkedrp/sos_commands/logs/journalctl_--no-pager
   Dec 30 13:24:27 worker-01 ovs-vswitchd[2570]: ovs|04469|ovs_rcu(urcu5)|WARN|blocked 1000 ms waiting for handler205 to quiesce
   Dec 30 13:24:28 worker-01 ovs-vswitchd[2570]: ovs|04470|ovs_rcu(urcu5)|WARN|blocked 2000 ms waiting for handler205 to quiesce
   Dec 30 13:24:30 worker-01 ovs-vswitchd[2570]: ovs|04471|ovs_rcu(urcu5)|WARN|blocked 4000 ms waiting for handler205 to quiesce
   Dec 30 13:24:34 worker-01 ovs-vswitchd[2570]: ovs|04472|ovs_rcu(urcu5)|WARN|blocked 8000 ms waiting for handler205 to quiesce
   Dec 30 13:24:42 worker-01 ovs-vswitchd[2570]: ovs|04473|ovs_rcu(urcu5)|WARN|blocked 16000 ms waiting for handler205 to quiesce
   Dec 30 13:24:58 worker-01 ovs-vswitchd[2570]: ovs|04474|ovs_rcu(urcu5)|WARN|blocked 32001 ms waiting for handler205 to quiesce
   Dec 30 13:25:30 worker-01 ovs-vswitchd[2570]: ovs|04475|ovs_rcu(urcu5)|WARN|blocked 64000 ms waiting for handler205 to quiesce
   Dec 30 13:26:34 worker-01 ovs-vswitchd[2570]: ovs|04476|ovs_rcu(urcu5)|WARN|blocked 128000 ms waiting for handler205 to quiesce
   Dec 30 13:27:29 worker-01 ovs-vswitchd[2570]: ovs|2755338|ovs_rcu|WARN|blocked 1000 ms waiting for handler205 to quiesce
   Dec 30 13:27:30 worker-01 ovs-vswitchd[2570]: ovs|2755339|ovs_rcu|WARN|blocked 2001 ms waiting for handler205 to quiesce
   Dec 30 13:27:32 worker-01 ovs-vswitchd[2570]: ovs|2755340|ovs_rcu|WARN|blocked 4001 ms waiting for handler205 to quiesce
   Dec 30 13:27:36 worker-01 ovs-vswitchd[2570]: ovs|2755341|ovs_rcu|WARN|blocked 8000 ms waiting for handler205 to quiesce
   Dec 30 13:27:44 worker-01 ovs-vswitchd[2570]: ovs|2755342|ovs_rcu|WARN|blocked 16000 ms waiting for handler205 to quiesce
   
   $ journalctl --file 0010-sosreport-worker-01-2022-12-30-tgkedrp.tar.xz/sosreport-worker-01-2022-12-30-tgkedrp/var/log/journal/ebdaaa6a15e74934ab46801b03c54f1c/system@0005f10d9d553026-4d6eb778c3078e7f.journal~ | less
   Dec 30 13:27:28 worker-01 ovs-vswitchd[2570]: ovs|00663|ofproto_dpif_upcall(handler164)|INFO|received packet on unassociated datapath port 4294967295
   Dec 30 13:27:28 worker-01 ovs-vswitchd[2570]: ovs|00112|ofproto_dpif_upcall(handler167)|INFO|received packet on unassociated datapath port 4294967295
   Dec 30 13:27:28 worker-01 ovs-vswitchd[2570]: ovs|01625|ofproto_dpif_upcall(handler163)|INFO|received packet on unassociated datapath port 4294967295
   Dec 30 13:27:28 worker-01 ovs-vswitchd[2570]: ovs|00036|ofproto_dpif_upcall(handler169)|INFO|received packet on unassociated datapath port 4294967295
   Dec 30 13:27:28 worker-01 ovs-vswitchd[2570]: ovs|01626|ofproto_dpif_upcall(handler163)|INFO|received packet on unassociated datapath port 4294967295
   Dec 30 13:27:28 worker-01 ovs-vswitchd[2570]: ovs|00269|ofproto_dpif_upcall(revalidator221)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:3c42e3d0-4deb-4aec-9a81-20ac2203f615
   Dec 30 13:27:28 worker-01 ovs-vswitchd[2570]: ovs|00270|ofproto_dpif_upcall(revalidator221)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:58da8da4-69b5-4fcb-a510-edec0e4cc660
   Dec 30 13:27:28 worker-01 ovs-vswitchd[2570]: ovs|00044|ofproto_dpif_upcall(revalidator223)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:0d07ff63-cb49-4017-aa87-a2ca23bce173
   Dec 30 13:27:28 worker-01 ovs-vswitchd[2570]: ovs|00050|ofproto_dpif_upcall(revalidator226)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:c59e776c-42cf-42ea-973d-81703da83dd0
   Dec 30 13:27:28 worker-01 ovs-vswitchd[2570]: ovs|00271|ofproto_dpif_upcall(revalidator221)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:40f3b297-351c-41ce-a031-88e70b214b0f
   Dec 30 13:27:28 worker-01 ovs-vswitchd[2570]: ovs|00051|ofproto_dpif_upcall(revalidator226)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:0686536b-a7be-4e7b-9a25-56c4d91ebed8
   Dec 30 13:27:28 worker-01 ovs-vswitchd[2570]: ovs|00272|ofproto_dpif_upcall(revalidator221)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:bc0affb0-a985-4a7d-86b1-7ee08d032bce
```

However, after speaking with Nabeel Cocker [RH Consultant], he mentioned that the OVS / OVN-Kubernetes team used this as a workaround to open up all cores to 'ovs-vswitchd' to try to prevent the issues seen in case 03211783.

I think we will need to go internally to speak with members of the PAO team as well as some of the consultants who worked closely with getting this setup on whether on whether this is the path forward or if there's another pivot/play needed here to mitigate this issue as well to prevent OVS processes from contention with Pod/Container processes by way of the tuning in the Performance Add-On Operator.

Actual results:

OVS fights over the core with some process. We see 'stalld' go into action at this time as well

Expected results:

Node never has an issue with OVS due to proper tuning of PAO

Additional info:

Assignee:: Andrew Theurer

Reporter:: Albert Cardenas

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2023/01/04 5:19 PM

Updated:: 2024/03/18 1:24 PM

Resolved:: 2024/03/18 1:24 PM

Details

Description

Attachments

Activity

People

Dates

Hide