add kernel debug doc

This commit is contained in:
Robert McMahon 2026-04-08 11:07:06 -07:00
parent ef016695c6
commit e916743757
1 changed files with 281 additions and 0 deletions

View File

@ -0,0 +1,281 @@
# Umber Kernel Debug Setup
This document covers enabling kernel core dumps (kdump) and kernel GDB (KGDB/kgdboe)
on the UAX-24 concentrator (ASUS WRX90E / Threadripper PRO 7955WX, Fedora 42).
---
## Prerequisites
### Required Kernel Config Options
Verify these are set in `linux-6.18.16/.config` before building:
#### Core Dump (kdump)
```
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PROC_VMCORE=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_BTF=y
CONFIG_UNWINDER_ORC=y
```
#### Kernel GDB / kgdboe
```
CONFIG_KGDB=y
CONFIG_KGDB_KDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
CONFIG_KGDB_LOW_LEVEL_TRAP=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_NETPOLL=y
```
Check with:
```bash
grep -E "CONFIG_KEXEC|CONFIG_CRASH_DUMP|CONFIG_PROC_VMCORE|CONFIG_DEBUG_INFO|CONFIG_KGDB|CONFIG_KDB|CONFIG_NETPOLL|CONFIG_MAGIC_SYSRQ" \
linux-6.18.16/.config
```
If any are missing, enable them before building:
```bash
cd linux-6.18.16
scripts/config --enable CONFIG_KGDB_KDB
scripts/config --enable CONFIG_NETPOLL
make LOCALVERSION="" olddefconfig
```
---
## Building the Umber Kernel
```bash
cd ~/Code/umber_kernel/linux-6.18.16
# Ensure correct version string (no doubling)
grep "^EXTRAVERSION" Makefile
# Should show: EXTRAVERSION = -umber.v9
# Clear CONFIG_LOCALVERSION to avoid doubling
scripts/config --set-str CONFIG_LOCALVERSION ""
# Verify release string
make LOCALVERSION="" kernelrelease
# Should show: 6.18.16-umber.v9
# Build
make -j$(nproc) LOCALVERSION="" bzImage modules 2>&1 | tee build.log
# Install
sudo make LOCALVERSION="" modules_install
sudo make LOCALVERSION="" install
```
### Common Pitfall: Doubled Version String
If `make kernelrelease` shows `6.18.16-umber.v9-umber.v9`, check:
1. `CONFIG_LOCALVERSION` in `.config` — must be empty string `""`
2. `include/config/auto.conf``CONFIG_LOCALVERSION=` must be empty
3. `include/generated/autoconf.h``#define CONFIG_LOCALVERSION ""` must be empty
Fix stale cached values:
```bash
sed -i 's/CONFIG_LOCALVERSION=.*/CONFIG_LOCALVERSION=/' include/config/auto.conf
sed -i 's/#define CONFIG_LOCALVERSION.*/#define CONFIG_LOCALVERSION ""/' include/generated/autoconf.h
echo "6.18.16-umber.v9" > include/config/kernel.release
```
Also clean up any misnamed module directories:
```bash
sudo rm -rf /lib/modules/6.18.16-umber.v9-umber.v9
sudo rm -rf /lib/modules/6.18.16-umber.v9-umber.v9+
```
---
## Core Dumps (kdump)
### 1. Install kdump tools
```bash
sudo dnf install kexec-tools
```
### 2. Reserve crash kernel memory
With 250GB RAM, 512M is appropriate:
```bash
sudo grubby --update-kernel=/boot/vmlinuz-6.18.16-umber.v9 --args="crashkernel=512M"
# Verify
sudo grubby --info=/boot/vmlinuz-6.18.16-umber.v9 | grep args
```
### 3. Configure kdump
`/etc/kdump.conf` minimal effective configuration:
```
path /var/crash
core_collector makedumpfile -l --message-level 7 -d 31
```
The `-d 31` flag filters out zero pages, significantly reducing dump size.
The `-l` flag enables compression.
### 4. Enable and start kdump
```bash
sudo systemctl enable kdump
```
### 5. Set grub default and reboot
```bash
sudo grubby --set-default /boot/vmlinuz-6.18.16-umber.v9
sudo grubby --default-kernel
sudo reboot
```
### 6. Verify after reboot
```bash
uname -r
# 6.18.16-umber.v9
cat /proc/cmdline | grep crashkernel
# should contain crashkernel=512M
kdumpctl status
# should show armed/ready
```
### 7. Analyzing a vmcore
After a crash, dumps land in `/var/crash/<timestamp>/vmcore`.
```bash
sudo dnf install crash
crash /var/crash/<timestamp>/vmcore /path/to/vmlinux-6.18.16-umber.v9
```
Useful `crash` commands:
```
bt # backtrace of crashing context
foreach bt # backtrace of every thread
ps # process state at time of crash
log # kernel message buffer
```
The `vmlinux` with debug symbols is in the kernel build tree at
`~/Code/umber_kernel/linux-6.18.16/vmlinux`.
### Triggering a Test Crash
To verify kdump works before relying on it:
```bash
echo 1 | sudo tee /proc/sys/kernel/sysrq
echo c | sudo tee /proc/sysrq-trigger
```
This forces an immediate kernel panic and should produce a vmcore in `/var/crash`.
---
## Kernel GDB / kgdboe
kgdboe (KGDB over Ethernet) is the preferred backend for the UAX-24 since the
system may remain ping-reachable during a hard freeze.
### On the Target (UAX-24 concentrator)
Load the kgdboe module at runtime, specifying the host machine that will
connect as the GDB client:
```bash
# Replace ethX with the actual interface name and <host-ip> with the GDB client IP
sudo modprobe kgdboe kgdboe="@<target-ip>/<ethX>,@<host-ip>/"
# Example:
sudo modprobe kgdboe kgdboe="@192.168.1.10/enp4s0,@192.168.1.100/"
```
To break into the kernel debugger immediately:
```bash
echo g | sudo tee /proc/sysrq-trigger
```
Or trigger via SysRq key: `Alt+SysRq+G`
### On the Host (GDB client machine)
Install cross-tools if needed (native x86_64 GDB works fine on another x86 machine):
```bash
sudo dnf install gdb
```
Connect to the target:
```bash
gdb ~/Code/umber_kernel/linux-6.18.16/vmlinux
(gdb) target remote udp:<target-ip>:6443
```
### Useful GDB Commands for Kernel Debugging
```
(gdb) bt # backtrace
(gdb) info threads # list all threads (CPUs)
(gdb) thread <n> # switch to CPU context n
(gdb) lx-ps # list processes (requires linux gdb scripts)
(gdb) lx-dmesg # dump kernel log buffer
(gdb) monitor reboot # reboot target
(gdb) monitor go # continue execution
```
### Linux GDB Helper Scripts
The kernel ships GDB helper scripts that make kernel debugging much more powerful:
```bash
# Add to ~/.gdbinit
echo "add-auto-load-safe-path ~/Code/umber_kernel/linux-6.18.16/scripts/gdb/" >> ~/.gdbinit
```
Then in GDB:
```
(gdb) lx-symbols
```
This loads module symbols and enables `lx-ps`, `lx-dmesg`, `lx-list`, etc.
---
## Relationship Between kdump and KGDB
| Tool | Use Case |
|------|----------|
| kdump + crash | Post-mortem analysis of hard freezes and panics |
| KGDB/kgdboe | Live debugging when kernel is still partially responsive |
| NMI watchdog | Forces a panic (and thus a dump) on hard lockup |
For hard freezes where the system is unresponsive, kdump is the primary tool.
Enable the NMI watchdog to ensure a freeze triggers a dump automatically:
```bash
sudo grubby --update-kernel=/boot/vmlinuz-6.18.16-umber.v9 \
--args="nmi_watchdog=1 hardlockup_panic=1 softlockup_panic=1"
```
---
## Quick Reference
| Task | Command |
|------|---------|
| Check kdump status | `kdumpctl status` |
| Check crashkernel reservation | `cat /proc/cmdline \| grep crashkernel` |
| Trigger test crash | `echo c \| sudo tee /proc/sysrq-trigger` |
| Load kgdboe | `sudo modprobe kgdboe kgdboe="@<target>/<iface>,@<host>/"` |
| Break into KDB | `echo g \| sudo tee /proc/sysrq-trigger` |
| Analyze vmcore | `crash vmcore vmlinux` |