From e916743757529f7bcedd0ad1c78971af7df578a8 Mon Sep 17 00:00:00 2001 From: Robert McMahon Date: Wed, 8 Apr 2026 11:07:06 -0700 Subject: [PATCH] add kernel debug doc --- Documentation/umber/kernel-debug-setup.md | 281 ++++++++++++++++++++++ 1 file changed, 281 insertions(+) create mode 100644 Documentation/umber/kernel-debug-setup.md diff --git a/Documentation/umber/kernel-debug-setup.md b/Documentation/umber/kernel-debug-setup.md new file mode 100644 index 000000000000..52998e68ec97 --- /dev/null +++ b/Documentation/umber/kernel-debug-setup.md @@ -0,0 +1,281 @@ +# Umber Kernel Debug Setup + +This document covers enabling kernel core dumps (kdump) and kernel GDB (KGDB/kgdboe) +on the UAX-24 concentrator (ASUS WRX90E / Threadripper PRO 7955WX, Fedora 42). + +--- + +## Prerequisites + +### Required Kernel Config Options + +Verify these are set in `linux-6.18.16/.config` before building: + +#### Core Dump (kdump) +``` +CONFIG_KEXEC=y +CONFIG_CRASH_DUMP=y +CONFIG_PROC_VMCORE=y +CONFIG_DEBUG_INFO=y +CONFIG_DEBUG_INFO_BTF=y +CONFIG_UNWINDER_ORC=y +``` + +#### Kernel GDB / kgdboe +``` +CONFIG_KGDB=y +CONFIG_KGDB_KDB=y +CONFIG_KGDB_SERIAL_CONSOLE=y +CONFIG_KGDB_LOW_LEVEL_TRAP=y +CONFIG_MAGIC_SYSRQ=y +CONFIG_NETPOLL=y +``` + +Check with: +```bash +grep -E "CONFIG_KEXEC|CONFIG_CRASH_DUMP|CONFIG_PROC_VMCORE|CONFIG_DEBUG_INFO|CONFIG_KGDB|CONFIG_KDB|CONFIG_NETPOLL|CONFIG_MAGIC_SYSRQ" \ + linux-6.18.16/.config +``` + +If any are missing, enable them before building: +```bash +cd linux-6.18.16 +scripts/config --enable CONFIG_KGDB_KDB +scripts/config --enable CONFIG_NETPOLL +make LOCALVERSION="" olddefconfig +``` + +--- + +## Building the Umber Kernel + +```bash +cd ~/Code/umber_kernel/linux-6.18.16 + +# Ensure correct version string (no doubling) +grep "^EXTRAVERSION" Makefile +# Should show: EXTRAVERSION = -umber.v9 + +# Clear CONFIG_LOCALVERSION to avoid doubling +scripts/config --set-str CONFIG_LOCALVERSION "" + +# Verify release string +make LOCALVERSION="" kernelrelease +# Should show: 6.18.16-umber.v9 + +# Build +make -j$(nproc) LOCALVERSION="" bzImage modules 2>&1 | tee build.log + +# Install +sudo make LOCALVERSION="" modules_install +sudo make LOCALVERSION="" install +``` + +### Common Pitfall: Doubled Version String + +If `make kernelrelease` shows `6.18.16-umber.v9-umber.v9`, check: + +1. `CONFIG_LOCALVERSION` in `.config` — must be empty string `""` +2. `include/config/auto.conf` — `CONFIG_LOCALVERSION=` must be empty +3. `include/generated/autoconf.h` — `#define CONFIG_LOCALVERSION ""` must be empty + +Fix stale cached values: +```bash +sed -i 's/CONFIG_LOCALVERSION=.*/CONFIG_LOCALVERSION=/' include/config/auto.conf +sed -i 's/#define CONFIG_LOCALVERSION.*/#define CONFIG_LOCALVERSION ""/' include/generated/autoconf.h +echo "6.18.16-umber.v9" > include/config/kernel.release +``` + +Also clean up any misnamed module directories: +```bash +sudo rm -rf /lib/modules/6.18.16-umber.v9-umber.v9 +sudo rm -rf /lib/modules/6.18.16-umber.v9-umber.v9+ +``` + +--- + +## Core Dumps (kdump) + +### 1. Install kdump tools + +```bash +sudo dnf install kexec-tools +``` + +### 2. Reserve crash kernel memory + +With 250GB RAM, 512M is appropriate: +```bash +sudo grubby --update-kernel=/boot/vmlinuz-6.18.16-umber.v9 --args="crashkernel=512M" + +# Verify +sudo grubby --info=/boot/vmlinuz-6.18.16-umber.v9 | grep args +``` + +### 3. Configure kdump + +`/etc/kdump.conf` minimal effective configuration: +``` +path /var/crash +core_collector makedumpfile -l --message-level 7 -d 31 +``` + +The `-d 31` flag filters out zero pages, significantly reducing dump size. +The `-l` flag enables compression. + +### 4. Enable and start kdump + +```bash +sudo systemctl enable kdump +``` + +### 5. Set grub default and reboot + +```bash +sudo grubby --set-default /boot/vmlinuz-6.18.16-umber.v9 +sudo grubby --default-kernel +sudo reboot +``` + +### 6. Verify after reboot + +```bash +uname -r +# 6.18.16-umber.v9 + +cat /proc/cmdline | grep crashkernel +# should contain crashkernel=512M + +kdumpctl status +# should show armed/ready +``` + +### 7. Analyzing a vmcore + +After a crash, dumps land in `/var/crash//vmcore`. + +```bash +sudo dnf install crash + +crash /var/crash//vmcore /path/to/vmlinux-6.18.16-umber.v9 +``` + +Useful `crash` commands: +``` +bt # backtrace of crashing context +foreach bt # backtrace of every thread +ps # process state at time of crash +log # kernel message buffer +``` + +The `vmlinux` with debug symbols is in the kernel build tree at +`~/Code/umber_kernel/linux-6.18.16/vmlinux`. + +### Triggering a Test Crash + +To verify kdump works before relying on it: +```bash +echo 1 | sudo tee /proc/sys/kernel/sysrq +echo c | sudo tee /proc/sysrq-trigger +``` + +This forces an immediate kernel panic and should produce a vmcore in `/var/crash`. + +--- + +## Kernel GDB / kgdboe + +kgdboe (KGDB over Ethernet) is the preferred backend for the UAX-24 since the +system may remain ping-reachable during a hard freeze. + +### On the Target (UAX-24 concentrator) + +Load the kgdboe module at runtime, specifying the host machine that will +connect as the GDB client: + +```bash +# Replace ethX with the actual interface name and with the GDB client IP +sudo modprobe kgdboe kgdboe="@/,@/" + +# Example: +sudo modprobe kgdboe kgdboe="@192.168.1.10/enp4s0,@192.168.1.100/" +``` + +To break into the kernel debugger immediately: +```bash +echo g | sudo tee /proc/sysrq-trigger +``` + +Or trigger via SysRq key: `Alt+SysRq+G` + +### On the Host (GDB client machine) + +Install cross-tools if needed (native x86_64 GDB works fine on another x86 machine): +```bash +sudo dnf install gdb +``` + +Connect to the target: +```bash +gdb ~/Code/umber_kernel/linux-6.18.16/vmlinux + +(gdb) target remote udp::6443 +``` + +### Useful GDB Commands for Kernel Debugging + +``` +(gdb) bt # backtrace +(gdb) info threads # list all threads (CPUs) +(gdb) thread # switch to CPU context n +(gdb) lx-ps # list processes (requires linux gdb scripts) +(gdb) lx-dmesg # dump kernel log buffer +(gdb) monitor reboot # reboot target +(gdb) monitor go # continue execution +``` + +### Linux GDB Helper Scripts + +The kernel ships GDB helper scripts that make kernel debugging much more powerful: +```bash +# Add to ~/.gdbinit +echo "add-auto-load-safe-path ~/Code/umber_kernel/linux-6.18.16/scripts/gdb/" >> ~/.gdbinit +``` + +Then in GDB: +``` +(gdb) lx-symbols +``` + +This loads module symbols and enables `lx-ps`, `lx-dmesg`, `lx-list`, etc. + +--- + +## Relationship Between kdump and KGDB + +| Tool | Use Case | +|------|----------| +| kdump + crash | Post-mortem analysis of hard freezes and panics | +| KGDB/kgdboe | Live debugging when kernel is still partially responsive | +| NMI watchdog | Forces a panic (and thus a dump) on hard lockup | + +For hard freezes where the system is unresponsive, kdump is the primary tool. +Enable the NMI watchdog to ensure a freeze triggers a dump automatically: + +```bash +sudo grubby --update-kernel=/boot/vmlinuz-6.18.16-umber.v9 \ + --args="nmi_watchdog=1 hardlockup_panic=1 softlockup_panic=1" +``` + +--- + +## Quick Reference + +| Task | Command | +|------|---------| +| Check kdump status | `kdumpctl status` | +| Check crashkernel reservation | `cat /proc/cmdline \| grep crashkernel` | +| Trigger test crash | `echo c \| sudo tee /proc/sysrq-trigger` | +| Load kgdboe | `sudo modprobe kgdboe kgdboe="@/,@/"` | +| Break into KDB | `echo g \| sudo tee /proc/sysrq-trigger` | +| Analyze vmcore | `crash vmcore vmlinux` |