Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Minor
Fix Version/s: None
Affects Version/s: 4.10
Component/s: Networking / ovn-kubernetes
Labels:
- SDN-Bug-Backlog-Reduction-Lack-Of-Team-Cycles

Severity:
Low
Regression:
No
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:
PX Technical Impact Notes:
RCA Only; workaround to rebuild db's. SE for 4.10 until 06/14
PX Impact Range:
PX Review Complete:
PX Technical Impact:

Description of problem:
* On or about March 22nd 15:00 --> ovnkube-master pod resource consumption increased in the sbdb container unexpectedly to around 25GB Memory from 14GB very rapidly. Node lost sbdb leader processes, and the cluster remained unstable until node restarts were taken.
* Seeking assistance with understanding the origin of the memory expansion and how to avoid it.

Version-Release number of selected component (if applicable):
- OCP 4.10.55

How reproducible:
- One time

Steps to Reproduce:

Unknown - Issue occurred with little/no change on the cluster; memory ballooned on SBDB container and leader election was lost, forcing db rebuild process to restore/stabilize.

Actual results:
- cluster instability/unexpected downtime

Expected results:
- cluster stability/memory expansion should be tied to some process - looking for assistance in identifying the cause.

Additional info:

Top MEM-using processes:
    USER PID %CPU %MEM VSZ-MiB RSS-MiB TTY STAT START TIME COMMAND
    root 1946340 4.7 44.2 28687 28502 ? - Feb12 3003:06 ovsdb-server -vconsole:info -vfile:off
    root 1820836 29.3 9.5 7808 6148 ? - Mar24 1417:21 kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/c>
    root 1948344 15.2 3.9 3598 2535 ? - Feb12 9549:34 kube-controller-manager --openshift-config=/etc/kubernetes/static-pod-re>
    root 5479 27.9 3.4 13061 2221 ? - 2023 42769:02 etcd --logger=zap --log-level=info
    contain+ 1940935 4.4 1.3 5039 876 ? - Feb12 2789:29 /bin/olm --namespace openshift-operator-lifecycle-manager
    nfsnobo+ 1941677 1.6 1.1 4381 751 ? - Feb12 1037:45 /usr/bin/cluster-network-operator start --listen=0.0.0.0:9104
    root 1188367 61.8 1.0 3902 690 ? - Mar25 1816:12 openshift-apiserver start --config=/var/run/configmaps/config/config.yam>
    root 3090 24.0 1.0 858 674 ? - 2023 36834:46 ovn-northd --no-chdir -vconsole:info
    1000510+ 1940876 0.9 1.0 4492 646 ? - Feb12 603:30 service-ca-operator controller -v=2
    contain+ 23622 2.2 0.9 11923 587 ? - 2023 3450:56 /bin/opm registry serve

  
{code:java}
CONTAINER CPU % MEM DISK INODES NAME
0fb0a93a97298 0.30 70.98MB 1.45GB 72 ## download-server
1fb02e4408c99 19.59 2.655GB 249.9kB 35 ## kube-controller-manager
4b2d057964cb2 32.40 2.279GB 8.192kB 18 ## etcd
522bbc47bf0dd 1.27 1.027GB 28.67kB 32 ## network-operator
77f72218121db 49.11 29.94GB 12.29kB 32 ## sbdb <--- expanded unexpectedly
c7743184a5bc5 0.80 1.144GB 8.192kB 26 ## olm-operator
da52e3918bed2 0.79 1.092GB 28.67kB 33 ## service-ca-controller
e3dd0c1f5af2e 27.61 6.387GB 254kB 35 ## kube-apiserver

High container cpu/mem processes: from crictl stats | grep GB

* template details in first update below

Assignee:: Ben Bennett

Reporter:: Will Russell

QA Contact:: Anurag Saxena

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/05/02 2:42 PM

Updated:: 2024/05/13 2:24 PM

Resolved:: 2024/05/13 1:59 PM

Details

Description

Attachments

Activity

People

Dates

Hide