Loading...

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: rhel-9.0.0
Component/s: crash
Labels:
- MigratedToJIRA

Severity:
Major

Pool Team:

sst_kernel_debug
Sub-System Group:

ssg_core_kernel

Blocked:
False
Blocked Reason:

Hide

None

Show
None

Release Note Type:
If docs needed, set a value

Experience:
Architecture:

x86_64
Bugzilla Bug:
RHBZ: 2124679

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

Under certain conditions, the 'whatis' command causes subsequent 'set scope' commands to fail.

Version-Release number of selected component (if applicable):

The RHEL system that my reproduction is on is a RHEL 8.6 system, but it's running upstream crash. The issue only appears after 8.0.0 so setting bug to RHEL9.

This issue is reproducible on the latest upstream crash.

How reproducible:

Every time on particular cores.

Steps to Reproduce:

[user@host ~]$ retrace-server-interact 195860157 crash
If you want to execute the command manually, you can run
$ /cores/crashext/gitlab-runner/bin/crash -i /cores/retrace/tasks/195860157/crashrc /cores/retrace/tasks/195860157/crash/vmcore.vmsn /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.32-754.35.1.el6.x86_64/vmlinux

crash 8.0.1++
Copyright (C) 2002-2022 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011, 2020-2022 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.

KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.32-754.35.1.el6.x86_64/vmlinux [TAINTED]
DUMPFILE: /cores/retrace/tasks/195860157/crash/vmcore.vmsn
CPUS: 4
DATE: Tue Jun 14 09:55:58 EDT 2022
UPTIME: 00:22:11
LOAD AVERAGE: 5756.29, 5541.10, 3825.59
TASKS: 19403
RELEASE: 2.6.32-754.35.1.el6.x86_64
VERSION: #1 SMP Wed Sep 16 06:48:01 EDT 2020
MACHINE: x86_64 (2494 Mhz)
MEMORY: 16 GB
PANIC: ""
PID: 0
COMMAND: "swapper"
TASK: ffffffff81a97020 (1 of 4) [THREAD_INFO: ffffffff81a00000]
CPU: 0
STATE: TASK_RUNNING (ACTIVE)
WARNING: panic task not found

mod: cannot find or load object file for vmci module
mod: cannot find or load object file for vsock module
mod: cannot find or load object file for seos module
crash> cd /cores/retrace/tasks/195860157/results
Working directory /cores/retrace/tasks/195860157/results.

crash> set scope dm_table_create
scope: ffffffffa0006cf0 (dm_table_create)

crash> dm_table
struct dm_table {
uint64_t features;
struct mapped_device *md;
unsigned int type;
unsigned int depth;
unsigned int counts[16];
sector_t *index[16];
unsigned int num_targets;
unsigned int num_allocated;
sector_t *highs;
struct dm_target *targets;
struct target_type *immutable_target_type;
unsigned int integrity_supported : 1;
unsigned int singleton : 1;
fmode_t mode;
struct list_head devices;
void (*event_fn)(void *);
void *event_context;
struct dm_md_mempools *mempools;
struct list_head target_callbacks;
}
SIZE: 312

crash> whatis _name_buckets
struct list_head _name_buckets[64];

crash> dm_table
struct dm_table {
int undefined__;
}
SIZE: 312

crash> set scope dm_table_create
scope: ffffffffa0006cf0 (dm_table_create)

crash> dm_table
struct dm_table {
int undefined__;
}
SIZE: 312

Actual results:

After using the whatis command, the system will no longer show the contents of the dm_table struct.

Expected results:

set scope should work or original dm_table issue should be fixed so set scope would not be needed or both

Additional info:

It is unclear exactly why, but due to the presence of a dummy struct in the dm layer, sometimes cores will queue up and the contents of dm_table will not be correct.

crash> dm_table
struct dm_table {
int undefined__;
}

To fix this in the past, we have been able to set the scope. After setting the scope, dm_table shows correctly.

crash> set scope dm_table_create
scope: ffffffffa0006cf0 (dm_table_create)

crash> dm_table
struct dm_table {
uint64_t features;
struct mapped_device *md;
unsigned int type;
unsigned int depth;
unsigned int counts[16];
sector_t *index[16];
unsigned int num_targets;
unsigned int num_allocated;
sector_t *highs;
struct dm_target *targets;
struct target_type *immutable_target_type;
unsigned int integrity_supported : 1;
unsigned int singleton : 1;
fmode_t mode;
struct list_head devices;
void (*event_fn)(void *);
void *event_context;
struct dm_md_mempools *mempools;
struct list_head target_callbacks;
}
SIZE: 312

In knowing this, when running our dm layer pykdump scripts, we go ahead and set the scope to ensure dm_table shows correctly as the rest of the script depends on it. It was found that if readSymbol() was called after setting scope, dm_table no longer showed correctly, and even another set scope would not fix. readSymbol() calls crash.gdb_whatis.

Two curious notes:

1) I've only been able to reproduce the issue with the below sequence

set scope dm_table_create
dm_table
whatis _name_buckets
dm_table

The below sequence does not reproduce. The extra dm_table at the start somehow makes a difference.

dm_table
set scope dm_table_create
dm_table
whatis _name_buckets
dm_table

2) I've only been able to reproduce calling whatis on the _name_buckets symbol. I've only tried a couple others.

Additionally, on the system I was working with, I bisected upstream and below is the commit at which the issue started.

2f967fb5ebd737ce5eadba462df35935122e8865 is the first bad commit
commit 2f967fb5ebd737ce5eadba462df35935122e8865
Author: Alexey Makhalov <amakhalov@vmware.com>
Date: Fri Mar 19 21:07:33 2021 -0700

crash_taget: fetch_registers support

Provides API for crash_target to fetch registers of given
CPU. It will allow gdb to perform such commands as "bt",
"frame", "info locals".

Highlevel API is crash_get_cpu_reg (). It calls machine
(architecture) specific function: machdep->get_cpu_reg().
Input arguments such as register number and register size
come from gdb arch information. So, get_cpu_regs()
implementations in crash must understand it.

Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>

crash_target.c | 33 ++++++++++++++++++++++++++++++++-
defs.h | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
gdb_interface.c | 19 ++++++++++++++-----
vmware_vmss.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
x86_64.c | 16 ++++++++++++++++
5 files changed, 163 insertions, 7 deletions

external trackers

Red Hat Issue Tracker RHELPLAN-133359

Details

Description

Attachments

Issue Links

Activity

People

Dates