add kernel debug doc
This commit is contained in:
parent
ef016695c6
commit
e916743757
|
|
@ -0,0 +1,281 @@
|
|||
# Umber Kernel Debug Setup
|
||||
|
||||
This document covers enabling kernel core dumps (kdump) and kernel GDB (KGDB/kgdboe)
|
||||
on the UAX-24 concentrator (ASUS WRX90E / Threadripper PRO 7955WX, Fedora 42).
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Required Kernel Config Options
|
||||
|
||||
Verify these are set in `linux-6.18.16/.config` before building:
|
||||
|
||||
#### Core Dump (kdump)
|
||||
```
|
||||
CONFIG_KEXEC=y
|
||||
CONFIG_CRASH_DUMP=y
|
||||
CONFIG_PROC_VMCORE=y
|
||||
CONFIG_DEBUG_INFO=y
|
||||
CONFIG_DEBUG_INFO_BTF=y
|
||||
CONFIG_UNWINDER_ORC=y
|
||||
```
|
||||
|
||||
#### Kernel GDB / kgdboe
|
||||
```
|
||||
CONFIG_KGDB=y
|
||||
CONFIG_KGDB_KDB=y
|
||||
CONFIG_KGDB_SERIAL_CONSOLE=y
|
||||
CONFIG_KGDB_LOW_LEVEL_TRAP=y
|
||||
CONFIG_MAGIC_SYSRQ=y
|
||||
CONFIG_NETPOLL=y
|
||||
```
|
||||
|
||||
Check with:
|
||||
```bash
|
||||
grep -E "CONFIG_KEXEC|CONFIG_CRASH_DUMP|CONFIG_PROC_VMCORE|CONFIG_DEBUG_INFO|CONFIG_KGDB|CONFIG_KDB|CONFIG_NETPOLL|CONFIG_MAGIC_SYSRQ" \
|
||||
linux-6.18.16/.config
|
||||
```
|
||||
|
||||
If any are missing, enable them before building:
|
||||
```bash
|
||||
cd linux-6.18.16
|
||||
scripts/config --enable CONFIG_KGDB_KDB
|
||||
scripts/config --enable CONFIG_NETPOLL
|
||||
make LOCALVERSION="" olddefconfig
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Building the Umber Kernel
|
||||
|
||||
```bash
|
||||
cd ~/Code/umber_kernel/linux-6.18.16
|
||||
|
||||
# Ensure correct version string (no doubling)
|
||||
grep "^EXTRAVERSION" Makefile
|
||||
# Should show: EXTRAVERSION = -umber.v9
|
||||
|
||||
# Clear CONFIG_LOCALVERSION to avoid doubling
|
||||
scripts/config --set-str CONFIG_LOCALVERSION ""
|
||||
|
||||
# Verify release string
|
||||
make LOCALVERSION="" kernelrelease
|
||||
# Should show: 6.18.16-umber.v9
|
||||
|
||||
# Build
|
||||
make -j$(nproc) LOCALVERSION="" bzImage modules 2>&1 | tee build.log
|
||||
|
||||
# Install
|
||||
sudo make LOCALVERSION="" modules_install
|
||||
sudo make LOCALVERSION="" install
|
||||
```
|
||||
|
||||
### Common Pitfall: Doubled Version String
|
||||
|
||||
If `make kernelrelease` shows `6.18.16-umber.v9-umber.v9`, check:
|
||||
|
||||
1. `CONFIG_LOCALVERSION` in `.config` — must be empty string `""`
|
||||
2. `include/config/auto.conf` — `CONFIG_LOCALVERSION=` must be empty
|
||||
3. `include/generated/autoconf.h` — `#define CONFIG_LOCALVERSION ""` must be empty
|
||||
|
||||
Fix stale cached values:
|
||||
```bash
|
||||
sed -i 's/CONFIG_LOCALVERSION=.*/CONFIG_LOCALVERSION=/' include/config/auto.conf
|
||||
sed -i 's/#define CONFIG_LOCALVERSION.*/#define CONFIG_LOCALVERSION ""/' include/generated/autoconf.h
|
||||
echo "6.18.16-umber.v9" > include/config/kernel.release
|
||||
```
|
||||
|
||||
Also clean up any misnamed module directories:
|
||||
```bash
|
||||
sudo rm -rf /lib/modules/6.18.16-umber.v9-umber.v9
|
||||
sudo rm -rf /lib/modules/6.18.16-umber.v9-umber.v9+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Core Dumps (kdump)
|
||||
|
||||
### 1. Install kdump tools
|
||||
|
||||
```bash
|
||||
sudo dnf install kexec-tools
|
||||
```
|
||||
|
||||
### 2. Reserve crash kernel memory
|
||||
|
||||
With 250GB RAM, 512M is appropriate:
|
||||
```bash
|
||||
sudo grubby --update-kernel=/boot/vmlinuz-6.18.16-umber.v9 --args="crashkernel=512M"
|
||||
|
||||
# Verify
|
||||
sudo grubby --info=/boot/vmlinuz-6.18.16-umber.v9 | grep args
|
||||
```
|
||||
|
||||
### 3. Configure kdump
|
||||
|
||||
`/etc/kdump.conf` minimal effective configuration:
|
||||
```
|
||||
path /var/crash
|
||||
core_collector makedumpfile -l --message-level 7 -d 31
|
||||
```
|
||||
|
||||
The `-d 31` flag filters out zero pages, significantly reducing dump size.
|
||||
The `-l` flag enables compression.
|
||||
|
||||
### 4. Enable and start kdump
|
||||
|
||||
```bash
|
||||
sudo systemctl enable kdump
|
||||
```
|
||||
|
||||
### 5. Set grub default and reboot
|
||||
|
||||
```bash
|
||||
sudo grubby --set-default /boot/vmlinuz-6.18.16-umber.v9
|
||||
sudo grubby --default-kernel
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
### 6. Verify after reboot
|
||||
|
||||
```bash
|
||||
uname -r
|
||||
# 6.18.16-umber.v9
|
||||
|
||||
cat /proc/cmdline | grep crashkernel
|
||||
# should contain crashkernel=512M
|
||||
|
||||
kdumpctl status
|
||||
# should show armed/ready
|
||||
```
|
||||
|
||||
### 7. Analyzing a vmcore
|
||||
|
||||
After a crash, dumps land in `/var/crash/<timestamp>/vmcore`.
|
||||
|
||||
```bash
|
||||
sudo dnf install crash
|
||||
|
||||
crash /var/crash/<timestamp>/vmcore /path/to/vmlinux-6.18.16-umber.v9
|
||||
```
|
||||
|
||||
Useful `crash` commands:
|
||||
```
|
||||
bt # backtrace of crashing context
|
||||
foreach bt # backtrace of every thread
|
||||
ps # process state at time of crash
|
||||
log # kernel message buffer
|
||||
```
|
||||
|
||||
The `vmlinux` with debug symbols is in the kernel build tree at
|
||||
`~/Code/umber_kernel/linux-6.18.16/vmlinux`.
|
||||
|
||||
### Triggering a Test Crash
|
||||
|
||||
To verify kdump works before relying on it:
|
||||
```bash
|
||||
echo 1 | sudo tee /proc/sys/kernel/sysrq
|
||||
echo c | sudo tee /proc/sysrq-trigger
|
||||
```
|
||||
|
||||
This forces an immediate kernel panic and should produce a vmcore in `/var/crash`.
|
||||
|
||||
---
|
||||
|
||||
## Kernel GDB / kgdboe
|
||||
|
||||
kgdboe (KGDB over Ethernet) is the preferred backend for the UAX-24 since the
|
||||
system may remain ping-reachable during a hard freeze.
|
||||
|
||||
### On the Target (UAX-24 concentrator)
|
||||
|
||||
Load the kgdboe module at runtime, specifying the host machine that will
|
||||
connect as the GDB client:
|
||||
|
||||
```bash
|
||||
# Replace ethX with the actual interface name and <host-ip> with the GDB client IP
|
||||
sudo modprobe kgdboe kgdboe="@<target-ip>/<ethX>,@<host-ip>/"
|
||||
|
||||
# Example:
|
||||
sudo modprobe kgdboe kgdboe="@192.168.1.10/enp4s0,@192.168.1.100/"
|
||||
```
|
||||
|
||||
To break into the kernel debugger immediately:
|
||||
```bash
|
||||
echo g | sudo tee /proc/sysrq-trigger
|
||||
```
|
||||
|
||||
Or trigger via SysRq key: `Alt+SysRq+G`
|
||||
|
||||
### On the Host (GDB client machine)
|
||||
|
||||
Install cross-tools if needed (native x86_64 GDB works fine on another x86 machine):
|
||||
```bash
|
||||
sudo dnf install gdb
|
||||
```
|
||||
|
||||
Connect to the target:
|
||||
```bash
|
||||
gdb ~/Code/umber_kernel/linux-6.18.16/vmlinux
|
||||
|
||||
(gdb) target remote udp:<target-ip>:6443
|
||||
```
|
||||
|
||||
### Useful GDB Commands for Kernel Debugging
|
||||
|
||||
```
|
||||
(gdb) bt # backtrace
|
||||
(gdb) info threads # list all threads (CPUs)
|
||||
(gdb) thread <n> # switch to CPU context n
|
||||
(gdb) lx-ps # list processes (requires linux gdb scripts)
|
||||
(gdb) lx-dmesg # dump kernel log buffer
|
||||
(gdb) monitor reboot # reboot target
|
||||
(gdb) monitor go # continue execution
|
||||
```
|
||||
|
||||
### Linux GDB Helper Scripts
|
||||
|
||||
The kernel ships GDB helper scripts that make kernel debugging much more powerful:
|
||||
```bash
|
||||
# Add to ~/.gdbinit
|
||||
echo "add-auto-load-safe-path ~/Code/umber_kernel/linux-6.18.16/scripts/gdb/" >> ~/.gdbinit
|
||||
```
|
||||
|
||||
Then in GDB:
|
||||
```
|
||||
(gdb) lx-symbols
|
||||
```
|
||||
|
||||
This loads module symbols and enables `lx-ps`, `lx-dmesg`, `lx-list`, etc.
|
||||
|
||||
---
|
||||
|
||||
## Relationship Between kdump and KGDB
|
||||
|
||||
| Tool | Use Case |
|
||||
|------|----------|
|
||||
| kdump + crash | Post-mortem analysis of hard freezes and panics |
|
||||
| KGDB/kgdboe | Live debugging when kernel is still partially responsive |
|
||||
| NMI watchdog | Forces a panic (and thus a dump) on hard lockup |
|
||||
|
||||
For hard freezes where the system is unresponsive, kdump is the primary tool.
|
||||
Enable the NMI watchdog to ensure a freeze triggers a dump automatically:
|
||||
|
||||
```bash
|
||||
sudo grubby --update-kernel=/boot/vmlinuz-6.18.16-umber.v9 \
|
||||
--args="nmi_watchdog=1 hardlockup_panic=1 softlockup_panic=1"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Task | Command |
|
||||
|------|---------|
|
||||
| Check kdump status | `kdumpctl status` |
|
||||
| Check crashkernel reservation | `cat /proc/cmdline \| grep crashkernel` |
|
||||
| Trigger test crash | `echo c \| sudo tee /proc/sysrq-trigger` |
|
||||
| Load kgdboe | `sudo modprobe kgdboe kgdboe="@<target>/<iface>,@<host>/"` |
|
||||
| Break into KDB | `echo g \| sudo tee /proc/sysrq-trigger` |
|
||||
| Analyze vmcore | `crash vmcore vmlinux` |
|
||||
Loading…
Reference in New Issue