Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-8737

[RHEL 9]'whatis' command causes subsequent 'set scope' commands to fail

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • rhel-9.0.0
    • crash
    • Major
    • sst_kernel_debug
    • ssg_core_kernel
    • False
    • Hide

      None

      Show
      None
    • If docs needed, set a value

      Description of problem:

      Under certain conditions, the 'whatis' command causes subsequent 'set scope' commands to fail.

      Version-Release number of selected component (if applicable):

      The RHEL system that my reproduction is on is a RHEL 8.6 system, but it's running upstream crash. The issue only appears after 8.0.0 so setting bug to RHEL9.

      This issue is reproducible on the latest upstream crash.

      How reproducible:

      Every time on particular cores.

      Steps to Reproduce:

      [user@host ~]$ retrace-server-interact 195860157 crash
      If you want to execute the command manually, you can run
      $ /cores/crashext/gitlab-runner/bin/crash -i /cores/retrace/tasks/195860157/crashrc /cores/retrace/tasks/195860157/crash/vmcore.vmsn /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.32-754.35.1.el6.x86_64/vmlinux

      crash 8.0.1++
      Copyright (C) 2002-2022 Red Hat, Inc.
      Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
      Copyright (C) 1999-2006 Hewlett-Packard Co
      Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
      Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
      Copyright (C) 2005, 2011, 2020-2022 NEC Corporation
      Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.

      KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.32-754.35.1.el6.x86_64/vmlinux [TAINTED]
      DUMPFILE: /cores/retrace/tasks/195860157/crash/vmcore.vmsn
      CPUS: 4
      DATE: Tue Jun 14 09:55:58 EDT 2022
      UPTIME: 00:22:11
      LOAD AVERAGE: 5756.29, 5541.10, 3825.59
      TASKS: 19403
      RELEASE: 2.6.32-754.35.1.el6.x86_64
      VERSION: #1 SMP Wed Sep 16 06:48:01 EDT 2020
      MACHINE: x86_64 (2494 Mhz)
      MEMORY: 16 GB
      PANIC: ""
      PID: 0
      COMMAND: "swapper"
      TASK: ffffffff81a97020 (1 of 4) [THREAD_INFO: ffffffff81a00000]
      CPU: 0
      STATE: TASK_RUNNING (ACTIVE)
      WARNING: panic task not found

      mod: cannot find or load object file for vmci module
      mod: cannot find or load object file for vsock module
      mod: cannot find or load object file for seos module
      crash> cd /cores/retrace/tasks/195860157/results
      Working directory /cores/retrace/tasks/195860157/results.

      crash> set scope dm_table_create
      scope: ffffffffa0006cf0 (dm_table_create)

      crash> dm_table
      struct dm_table {
      uint64_t features;
      struct mapped_device *md;
      unsigned int type;
      unsigned int depth;
      unsigned int counts[16];
      sector_t *index[16];
      unsigned int num_targets;
      unsigned int num_allocated;
      sector_t *highs;
      struct dm_target *targets;
      struct target_type *immutable_target_type;
      unsigned int integrity_supported : 1;
      unsigned int singleton : 1;
      fmode_t mode;
      struct list_head devices;
      void (*event_fn)(void *);
      void *event_context;
      struct dm_md_mempools *mempools;
      struct list_head target_callbacks;
      }
      SIZE: 312

      crash> whatis _name_buckets
      struct list_head _name_buckets[64];

      crash> dm_table
      struct dm_table {
      int undefined__;
      }
      SIZE: 312

      crash> set scope dm_table_create
      scope: ffffffffa0006cf0 (dm_table_create)

      crash> dm_table
      struct dm_table {
      int undefined__;
      }
      SIZE: 312

      Actual results:

      After using the whatis command, the system will no longer show the contents of the dm_table struct.

      Expected results:

      set scope should work or original dm_table issue should be fixed so set scope would not be needed or both

      Additional info:

      It is unclear exactly why, but due to the presence of a dummy struct in the dm layer, sometimes cores will queue up and the contents of dm_table will not be correct.

      crash> dm_table
      struct dm_table {
      int undefined__;
      }

      To fix this in the past, we have been able to set the scope. After setting the scope, dm_table shows correctly.

      crash> set scope dm_table_create
      scope: ffffffffa0006cf0 (dm_table_create)

      crash> dm_table
      struct dm_table {
      uint64_t features;
      struct mapped_device *md;
      unsigned int type;
      unsigned int depth;
      unsigned int counts[16];
      sector_t *index[16];
      unsigned int num_targets;
      unsigned int num_allocated;
      sector_t *highs;
      struct dm_target *targets;
      struct target_type *immutable_target_type;
      unsigned int integrity_supported : 1;
      unsigned int singleton : 1;
      fmode_t mode;
      struct list_head devices;
      void (*event_fn)(void *);
      void *event_context;
      struct dm_md_mempools *mempools;
      struct list_head target_callbacks;
      }
      SIZE: 312

      In knowing this, when running our dm layer pykdump scripts, we go ahead and set the scope to ensure dm_table shows correctly as the rest of the script depends on it. It was found that if readSymbol() was called after setting scope, dm_table no longer showed correctly, and even another set scope would not fix. readSymbol() calls crash.gdb_whatis.

      Two curious notes:

      1) I've only been able to reproduce the issue with the below sequence

      set scope dm_table_create
      dm_table
      whatis _name_buckets
      dm_table

      The below sequence does not reproduce. The extra dm_table at the start somehow makes a difference.

      dm_table
      set scope dm_table_create
      dm_table
      whatis _name_buckets
      dm_table

      2) I've only been able to reproduce calling whatis on the _name_buckets symbol. I've only tried a couple others.

      Additionally, on the system I was working with, I bisected upstream and below is the commit at which the issue started.

      2f967fb5ebd737ce5eadba462df35935122e8865 is the first bad commit
      commit 2f967fb5ebd737ce5eadba462df35935122e8865
      Author: Alexey Makhalov <amakhalov@vmware.com>
      Date: Fri Mar 19 21:07:33 2021 -0700

      crash_taget: fetch_registers support

      Provides API for crash_target to fetch registers of given
      CPU. It will allow gdb to perform such commands as "bt",
      "frame", "info locals".

      Highlevel API is crash_get_cpu_reg (). It calls machine
      (architecture) specific function: machdep->get_cpu_reg().
      Input arguments such as register number and register size
      come from gdb arch information. So, get_cpu_regs()
      implementations in crash must understand it.

      Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>

      crash_target.c | 33 ++++++++++++++++++++++++++++++++-
      defs.h | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
      gdb_interface.c | 19 ++++++++++++++-----
      vmware_vmss.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
      x86_64.c | 16 ++++++++++++++++
      5 files changed, 163 insertions, 7 deletions

            lijiang@redhat.com Lianbo Jiang
            rhn-support-jpittman John Pittman
            Lianbo Jiang Lianbo Jiang
            Jie Li Jie Li
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: