15 KiB
System test scripts (hardware harnesses)
Audience: software test engineers writing lab and bench automation in this repo.
Example: scripts/system/pcie_hotswap_harness.py — a small, readable pattern you can copy.
PCIe hot-swap setup (install, INI, JSON, commands): docs/pcie-hotswap-setup.md.
Pytest vs system scripts
tests/ + pytest |
scripts/system/ |
|
|---|---|---|
| Goal | Fast feedback, CI, mocks, gated remote tests | Long runs, real power, cables, enumeration |
| When it runs | python3 -m pytest tests/ on every change |
When the bench is wired and someone invokes the script |
| Failure meaning | Regression in code or contract | Often environment (wrong port, flaky USB, SSH) — design logs accordingly |
| Concurrency | Usually isolated tests | Often many logical paths sharing one USB tree or one SSH host |
Keep pytest strict and deterministic. Keep system scripts explicit about assumptions (CLI flags, env vars, dry-run) and safe defaults (no silent hardware actions).
What the example script does
scripts/system/pcie_hotswap_harness.py models a fronthaul (PCIe) hot-swap campaign:
- Build a
Fabric: either load--fabric-json(FabricDefinitionfrom disk →Fabric.rrhs,rrh_power_ports, fingerprint) or build N placeholderRadioHeadinstances (each with aFrontHaul) via--pathsand wrap them inFabric(optional concentratorssh_node,power_lock). - For each iteration, run
asyncio.TaskGroup: every RRH runsone_cycleconcurrently (stressing shared-resource design: one BrainStem, one rig SSH target, and so on). - Each cycle: log remove/restore phases (
--dry-run) or placeholders for futurePowercalls, then optionally SSH to the concentrator for a minimal smoke command (uname, samplelspcioutput). - Exit non-zero if the async campaign raises (including
TaskGroupchild failures), usingexcept* ExceptionsoExceptionGroupsurfaces every underlying error.
The script’s module docstring lists DESIGN_GAPS — known extension points so harness scope stays explicit.
Fabric JSON (discovery + bindings, one pass)
Full workflow (INI → discovery → prompts → JSON): docs/fabric-builder.md.
pip install -e ".[power]" on the workstation that sees the Acroname hub.
-
Fabric builder — use
buildwhen a lab INI must be loaded first;bindis the same with INI optional if the default path is missing:python3 -m fiwicontrol.fabric build -o configs/my-fabric.json -c configs/default.ini python3 -m fiwicontrol.fabric bind -o configs/my-fabric.json -c configs/default.ini -
Check freshness — exit 0 only if on-disk fingerprint matches live USB discovery:
python3 -m fiwicontrol.fabric status -f configs/my-fabric.json -
Harness — load that graph (optional
--strict-fabric-readyto requireREADYstatus):python3 scripts/system/pcie_hotswap_harness.py --fabric-json configs/my-fabric.json --dry-run
Types live under fiwicontrol.fabric (FabricDefinition, FabricRRHBinding, Fabric.binding_cache_status).
Concentrator dump (scripts/system/dump_concentrator.py)
Purpose: capture this machine’s concentrator-relevant facts in one place: CPU summary from /proc/cpuinfo, and (by default) a local host probe — lspci -tv, /sys/bus/pci/devices/*/current_link_width (and related link fields), and dmidecode -t baseboard when the binary succeeds (often after sudo, because SMBIOS is not always readable as a normal user).
Default output is human text, not JSON: a short CPU block; one line with the total count of sysfs PCI devices that expose negotiated link width/speed; a Wi‑Fi / wireless-only table (K of N) for PCI class 0x028… (network + wireless) with w/W lanes, GT/s current/max, class, and a chip column from lspci -nn (preferred) or sysfs vendor / device hex pair (long chip strings are truncated); a peek at the first --lspci-lines rows of lspci -tv (default 18, remainder summarized); and the first 14 lines of dmidecode -t baseboard when that command succeeds (often requires sudo on Fedora).
| Flag | Meaning |
|---|---|
--json |
Emit the full ConcentratorPlatformSnapshot.to_json_dict() document (large): CPU fields, optional lspci_tree, compact pci_device_links as {"cols":[...],"rows":[...]} (columns bdf, w, W, s, S, c = lanes and GT/s tokens and class), optional dmidecode_baseboard string. |
--no-host-probe |
CPU-only; skip lspci, sysfs PCI enumeration, and dmidecode. |
--pci-sysdir DIR |
Override /sys/bus/pci/devices (testing or nonstandard roots). |
--pci-all |
After the Wi‑Fi table, append a second table of other “interesting” non-wireless links (wide ports / downgrades), still capped by --pci-max-rows. |
--pci-max-rows N |
Cap for the optional second table (default 40). |
--lspci-lines N |
Lines of lspci -tv in human output (0 = omit that block; default 18). |
--label NAME |
Shown in the human header only. |
--proc-cpuinfo PATH |
Override /proc/cpuinfo (tests or chroots). |
Examples:
# Human summary (default); Wi‑Fi table + short lspci tree + DMI if allowed
python3 scripts/system/dump_concentrator.py
# Same with baseboard text (often needs root on Fedora)
sudo python3 scripts/system/dump_concentrator.py
# Machine JSON for tooling / CI artifacts
python3 scripts/system/dump_concentrator.py --json > /tmp/concentrator.json
Python API: fiwicontrol.concentrator.ConcentratorPlatform, ConcentratorPlatformSnapshot, PciDeviceLinkSnapshot, format_concentrator_platform_snapshot_human() (same layout as the script’s default text; optional lspci_nn_by_bdf= for tests). Implementation lives in src/fiwicontrol/concentrator/host.py (package fiwicontrol.concentrator — local workstation facts, parallel to fiwicontrol.radio for RRH aggregates; not part of fabric JSON).
When the harness (or your script) loads --fabric-json, it merges lab INI by default (same file as fiwicontrol.lab: FIWI_LAB_INI, else configs/default.ini if present). Pass --lab-ini PATH to point at another file. Merged keys include optional [fabric] (fabric_id, concentrator → [machine.*] SSH target) and optional [fabric.rrh.<radio_id>] to override Acroname port / patch panel / module serial for rows already present in the JSON. Use --no-lab-ini to skip. JSON supplies discovery_fingerprint and the RRH binding list (key rrhs; Python: FabricDefinition.rrhs) from fabric build / bind or fabric_realize.py --json.
Acroname discovery smoke test (scripts/system/test_acroname_usb_discovery.py)
Runs BrainStem USB enumeration per [machine.*] row in the lab INI: usb=local on the workstation you run from, usb=remote over SSH (same interpreter contract as fiwicontrol.power --discovery-json). Prints a short table per machine, brainstem_version from discovery JSON (with an SSH fallback pip probe when the remote build omits that field), and a total module count across hosts.
python3 scripts/system/test_acroname_usb_discovery.py
python3 scripts/system/test_acroname_usb_discovery.py -c configs/default.ini --json
python3 scripts/system/test_acroname_usb_discovery.py --local-only
Use --local-only to skip the INI and probe only this machine’s USB. See docs/power-control-and-inventory.md for INI fields.
Fabric compose + realize (scripts/system/fabric_realize.py --realize)
Loads the lab INI, runs local Acroname discovery, compose_definition, builds Fabric, then await fab.realize() (strict fingerprint check against live USB). Default stdout is an OK line plus print(fabric) (human Fabric.__str__ summary). Pass --json for stdout-only FabricDefinition JSON after a successful realize. -v adds discovery / pre-realize fabric lines on stderr; --no-strict passes strict=False into Fabric.realize(). --realize-discovery-timeout SEC bounds Acroname discovery during --realize (default 120). Exit codes and FDIR semantics: docs/fdir.md and fabric_realize.py --help (epilog).
Without --realize, fabric_realize.py only composes the definition and prints a human workstation report (or --json / -o for definition JSON without calling Fabric.realize()). The human report can merge patch-panel labels into the Wi‑Fi PCIe table when --patch-panel-json PATH is set or when <lab_ini_stem>_panel.json exists beside the lab INI (see fiwicontrol.fabric.patch_panel_json).
Prerequisites
-
Editable install from the repo root (see
docs/install.md):cd ~/Code/FiWiControl python3 -m pip install -e ".[dev]" -
Python 3.11+ — the example uses
asyncio.TaskGroupandexcept* Exception. -
Optional SSH to the rig — same contract as elsewhere: passwordless
root@<host>forsshtype="ssh". OptionalFIWI_SSH_CONFIGis documented indocs/node-control-asyncio-design.md. -
Power / Acroname — not wired in the example yet. When you add
fiwicontrol.power, usepip install -e ".[power]"and followdocs/power-control-and-inventory.md.
How to run the example
From the repository root (the script prepends src to sys.path if needed):
# Safe: no SSH, no hardware — exercises structure only
python3 scripts/system/pcie_hotswap_harness.py --dry-run --paths 2 --iterations 1
# With saved fabric JSON (after build/bind; merge lab INI at run time)
python3 scripts/system/pcie_hotswap_harness.py --fabric-json configs/my-fabric.json --lab-ini configs/default.ini --dry-run
# With SSH smoke on the concentrator (replace IP)
FIWI_REMOTE_IP=192.168.1.39 python3 scripts/system/pcie_hotswap_harness.py --dry-run --paths 2
# or
python3 scripts/system/pcie_hotswap_harness.py --dry-run --paths 2 --rig-ip 192.168.1.39
| Flag | Meaning |
|---|---|
--fabric-json PATH |
Load FabricDefinition from JSON; sets Fabric.rrhs and rrh_power_ports. Without it, uses --paths placeholders. |
--lab-ini PATH |
Lab INI merged after JSON (default: FIWI_LAB_INI, else configs/default.ini if present). |
--no-lab-ini |
Skip INI merge; JSON only. |
--strict-fabric-ready |
Exit 2 unless Fabric.binding_cache_status is READY (requires live Acroname discovery). Only meaningful with --fabric-json. |
--dry-run |
Log only; no programmable power (none hooked up in this skeleton). |
--paths N |
Placeholder RRH count (ignored when --fabric-json is set). |
--iterations M |
Outer loop: run M sequential TaskGroup rounds. |
--settle SEC |
Sleep between conceptual phases inside one_cycle. |
--rig-ip |
SSH target; defaults to FIWI_REMOTE_IP. Overrides JSON concentrator when set. If unset and JSON has no IP, remote checks are skipped. |
Patterns to reuse in your own harness
1. Thin main() — parse, configure logging, call asyncio.run
Keep I/O policy (flags, env) in main(). Keep async logic in async def functions so tests or imports can reuse the coroutines without a second event loop.
2. One coroutine per “story”: one_cycle, run_campaign
Name coroutines after user-visible steps (cycle, campaign, smoke). Pass explicit parameters (dry_run, settle_s, label) instead of hidden globals.
3. Concurrency with TaskGroup
When multiple RRHs run together, async with asyncio.TaskGroup() as tg: + tg.create_task(...) fails fast and bundles errors in an ExceptionGroup. Catch with except* Exception at the boundary that owns asyncio.run, log each sub-exception, and return a process exit code.
4. Dry-run first
Always provide a path that does not touch hardware so engineers can validate logging, SSH, and timing on a laptop. Real power transitions should be clearly gated (extra flag or explicit “I know this is live”).
5. Domain types from the library
Attach FrontHaul to RadioHead even when fields are None — it documents intent and keeps the harness aligned with production models. Pass a Fabric into the async campaign so shared resources (concentrator SSH, bench Power, asyncio.Lock, rrh_power_ports) have one home. Prefer --fabric-json (bound once via python3 -m fiwicontrol.fabric bind) over ad hoc placeholders; reserve --paths for laptop-only smoke.
6. Remote checks via ssh_node
Use await node.rexec(cmd="...", ...) for one-shot remote work. For periodic sampling, prefer Command / CommandManager from fiwicontrol.commands (see docs/node-control-asyncio-design.md).
7. Document gaps in the script
A short DESIGN_GAPS or TODO block at the top of the harness documents how enumeration, telemetry, or SPC relate to this script.
Checklist for a new system script
- Lives under
scripts/system/with a#!/usr/bin/env python3shebang. argparse(or equivalent) documents every assumption;--helpis accurate.--dry-run(or equivalent) when hardware is involved.loggingat INFO for operator visibility; avoidprintfor control flow.- Async entry is
async def+ singleasyncio.run(...)frommain(). - Concurrent work uses
TaskGroup(orgatherwith a documented error policy). - Non-zero exit on failure;
ExceptionGrouphandled if you useTaskGroup. - README or this doc updated if you add a new category of harness or dependency.
Related docs
docs/pcie-hotswap-setup.md— PCIe harness prerequisites and JSON generation.docs/fabric-builder.md— lab INI +python3 -m fiwicontrol.fabric build/bind.docs/install.md— workstation and rig setup,pip install -e.docs/node-control-asyncio-design.md—ssh_node,Command, timeouts, running tests.docs/power-control-and-inventory.md— Acroname / Monsoon, INI,--verify-inventory.docs/spc.md— when campaigns need statistical control charts after KPI extraction.README.md—scripts/system/vstests/overview.