Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-2154

Add 'numactl' RPM to CoreOS for better troubleshooting

    XMLWordPrintable

Details

    • False
    • False
    • 0
    • 0% 0%

    Description

      1. Proposed title of this feature request

      Add 'numactl' RPM to CoreOS image for better troubleshooting

      2. What is the nature and description of the request?

      One of our customers runs at least 30 clusters that I'm aware of and all are baremetal (air-gapped) with multi-processor systems, 512G of RAM and hugepages enabled for 25% of the system RAM. A lot of the performance tuning as well as RCAs post-outage require us to look at all the tiny details to come up with recommendations

      3. Why does the customer need this? (List the business requirements here)

      Having the 'numactl' RPM added to the CoreOS image would help us troubleshoot and get to a resolution quicker.

      4. List any affected packages or components.

      Looks like in RHEL 8.4, the required dependencies are met in CoreOS on OCP4.6:

       

      [root@hyp2 ~]# cat /etc/redhat-release 
      Red Hat Enterprise Linux release 8.4 (Ootpa)
      [root@hyp2 ~]# dnf whatprovides numactl
      numactl-2.0.12-11.el8.x86_64 : Library for tuning for Non Uniform Memory Access machines
      Repo : rhel-8-for-x86_64-baseos-rpms
      Matched from:
      Provide : numactl = 2.0.12-11.el8
      [root@hyp2 ~]$ dnf deplist numactl-2.0.12-11.el8.x86_64
      package: numactl-2.0.12-11.el8.x86_64
       dependency: /sbin/ldconfig
       provider: glibc-2.28-151.el8.i686
       provider: glibc-2.28-151.el8.x86_64
       dependency: libc.so.6()(64bit)
       provider: glibc-2.28-151.el8.x86_64
       dependency: libc.so.6(GLIBC_2.14)(64bit)
       provider: glibc-2.28-151.el8.x86_64
       dependency: libc.so.6(GLIBC_2.17)(64bit)
       provider: glibc-2.28-151.el8.x86_64
       dependency: libc.so.6(GLIBC_2.2.5)(64bit)
       provider: glibc-2.28-151.el8.x86_64
       dependency: libc.so.6(GLIBC_2.3)(64bit)
       provider: glibc-2.28-151.el8.x86_64
       dependency: libc.so.6(GLIBC_2.3.4)(64bit)
       provider: glibc-2.28-151.el8.x86_64
       dependency: libc.so.6(GLIBC_2.4)(64bit)
       provider: glibc-2.28-151.el8.x86_64
       dependency: libm.so.6()(64bit)
       provider: glibc-2.28-151.el8.x86_64
       dependency: libnuma.so.1()(64bit)
       provider: numactl-libs-2.0.12-11.el8.x86_64
       dependency: libnuma.so.1(libnuma_1.1)(64bit)
       provider: numactl-libs-2.0.12-11.el8.x86_64
       dependency: libnuma.so.1(libnuma_1.2)(64bit)
       provider: numactl-libs-2.0.12-11.el8.x86_64
       dependency: libnuma.so.1(libnuma_1.3)(64bit)
       provider: numactl-libs-2.0.12-11.el8.x86_64
       dependency: libnuma.so.1(libnuma_1.4)(64bit)
       provider: numactl-libs-2.0.12-11.el8.x86_64
       dependency: librt.so.1()(64bit)
       provider: glibc-2.28-151.el8.x86_64
       dependency: rtld(GNU_HASH)
       provider: glibc-2.28-151.el8.i686
       provider: glibc-2.28-151.el8.x86_64
      [root@hyp2 ~]# dnf download numactl
      numactl-2.0.12-11.el8.x86_64.rpm
      
      [root@hyp2 ~]# scp numactl-2.0.12-11.el8.x86_64.rpm core@192.168.0.33:
      numactl-2.0.12-11.el8.x86_64.rpm 100% 76KB 19.1MB/s 00:00
      
      [root@master-2 ~]# cat /etc/redhat-release 
      Red Hat Enterprise Linux CoreOS release 4.6
      
      [root@master-2 ~]# mount -o remount,rw /usr/
      
      [root@master-2 ~]# rpm -i /home/core/numactl-2.0.12-11.el8.x86_64.rpm 
      
      [root@master-2 ~]# numactl -H
      available: 2 nodes (0-1)
      node 0 cpus: 0 1
      node 0 size: 7971 MB
      node 0 free: 293 MB
      node 1 cpus: 2 3
      node 1 size: 8062 MB
      node 1 free: 105 MB
      node distances:
      node 0 1 
       0: 10 20 
       1: 20 10
      

       

      =======================

      I have a whole write-up in Gitlab that kind of explains what we're doing to circumvent this at the moment

      ### Why do we need this ?

      One thing I see over-looked a lot on multi-processor systems is the amount of RAM free per NUMA node. It's possible for a process to be spawned on a particular NUMA node and it's not uncommon for that process to balloon in memory usage (MySQL) on a single NUMA node over time. If that NUMA node starts running out of Free RAM, it's possible the process will be OOM-killed...

      ...When that happens, if you run 'free -m', leading up to the event you might see a lot of available RAM but it's deceiving if a lot of the free RAM belongs to the NUMA node where your process that was OOM-killed lived. While we may not know for sure where that process might've lived, we might be able to gather some tell-tale signs that could be a sign of memory exhaustion in the near-future.

      ### Here's how to calculate that (skip to step 7 if you don't want the explanation):

      1. The default memory pagesize is '4096' bytes or 4kB:
       

      [root@master-2 ~]# getconf PAGESIZE 
      4096

       

      2. Total number of Cores on my system
       

      [root@master-2 ~]# grep processor /proc/cpuinfo 
      processor : 0 
      processor : 1 
      processor : 2 
      processor : 3

       

      3. Total number of NUMA nodes on my system

      [root@master-2 ~]# ls /sys/devices/system/node/ | grep node 
      node0 
      node1

       

      4. Using that logic, I checked how much 'free' RAM is in the 'free' output and checked the 'nr_free_pages' on my lab with fake NUMA:

      [root@master-2 ~]# echo 'Free RAM (kB)': `free | awk '/Mem/ {print $4}'`; echo 'Free Pages (NUMA Node 0):' `awk '/nr_free_pages/ {print $2}' /sys/devices/system/node/node0/vmstat`; echo 'Free Pages (NUMA Node 1):' `awk '/nr_free_pages/ {print $2}' /sys/devices/system/node/node1/vmstat`
       
      Free RAM (kB): 9819444 
      Free Pages (NUMA Node 0): 1271037 
      Free Pages (NUMA Node 1): 1183757

       

      5. Knowing that, I wanted something more-sane and converted to MB as well as combined the total Free RAM between the two NUMA nodes to see the contrast

      [root@master-2 ~]# echo 'Free RAM (mB):' `free -m | awk '/Mem/ {print $4}'`; echo 'Free RAM (NUMA combined):' $(expr $(expr $(expr `awk '/nr_free_pages/ {print $2}' /sys/devices/system/node/node0/vmstat` + `awk '/nr_free_pages/ {print $2}' /sys/devices/system/node/node1/vmstat`) * 4) / 1024) 
      
      Free RAM (mB): 8446 
      Free RAM (NUMA combined): 8445

       

      6. Using the output in step 4 and plugging in the numbers, the math checks out:

      ( 9819444 / 4 ) == (1271037 + 1183757) 
      or 
      2454861 Free Pages as seen by 'free' command == 2454794 Free Pages as seen by each NUMA node combined

       

      7. Now that we know how many Cores/NUMA Nodes and how much total Free RAM there is, we can use some math to calculate how much Free RAM there is per NUMA node.

      [root@master-2 ~]# echo 'Free RAM (MB):' `free -m | awk '/Mem/ {print $4}'`; echo 'Free RAM (MB) in NUMA 0:' $(expr $(expr `awk '/nr_free_pages/ {print $2}' /sys/devices/system/node/node0/vmstat` * 4) / 1024); echo 'Free RAM (MB) in NUMA 1:' $(expr $(expr `awk '/nr_free_pages/ {print $2}' /sys/devices/system/node/node1/vmstat` * 4) / 1024) 
      
      Free RAM (MB): 9433 
      Free RAM (MB) in NUMA 0: 4672 
      Free RAM (MB) in NUMA 1: 4766

       
       

      Attachments

        Activity

          People

            rhn-support-mrussell Mark Russell
            rhn-support-acardena Albert Cardenas
            Votes:
            8 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: