21 KiB
System test scripts (hardware harnesses)
Audience: software test engineers writing lab and bench automation in this repo.
Example: **scripts/system/pcie_hotswap_harness.py** — a small, readable pattern you can copy.
PCIe hot-swap setup (install, INI, JSON, commands): **docs/pcie-hotswap-setup.md**.
Pytest vs system scripts
**tests/ + pytest** |
**scripts/system/** |
|
|---|---|---|
| Goal | Fast feedback, CI, mocks, gated remote tests | Long runs, real power, cables, enumeration |
| When it runs | python3 -m pytest tests/ on every change |
When the bench is wired and someone invokes the script |
| Failure meaning | Regression in code or contract | Often environment (wrong port, flaky USB, SSH) — design logs accordingly |
| Concurrency | Usually isolated tests | Often many logical paths sharing one USB tree or one SSH host |
Keep pytest strict and deterministic. Keep system scripts explicit about assumptions (CLI flags, env vars, dry-run) and safe defaults (no silent hardware actions).
What the example script does
**scripts/system/pcie_hotswap_harness.py** models a fronthaul (PCIe) hot-swap campaign:
- Build a
**Fabric: either load**--fabric-json(**FabricDefinition**from disk →**Fabric.rrhs**,**rrh_power_ports**, fingerprint) or build N placeholder**RadioHead**instances (each with a**FrontHaul**) via**--paths**and wrap them in**Fabric**(optional concentrator**ssh_node**,**power_lock**). - For each iteration, run
**asyncio.TaskGroup: every RRH runs**one_cycleconcurrently (stressing shared-resource design: one BrainStem, one rig SSH target, and so on). - Each cycle: log remove/restore phases (
**--dry-run** ) or placeholders for future**Power** calls, then optionally SSH to the concentrator for a minimal smoke command (uname, samplelspcioutput). - Exit non-zero if the async campaign raises (including
**TaskGroup** child failures), using**except* Exception** so**ExceptionGroup**surfaces every underlying error.
The script’s module docstring lists DESIGN_GAPS — known extension points so harness scope stays explicit.
Fabric JSON (discovery + bindings, one pass)
Full workflow (INI → discovery → prompts → JSON): **docs/fabric-builder.md**.
**pip install -e ".[power]"** on the workstation that sees the Acroname hub.
- Fabric builder — use
**build** when a lab INI must be loaded first;**bind** is the same with INI optional if the default path is missing:
python3 -m fiwicontrol.fabric build -o configs/my-fabric.json -c configs/default.ini
python3 -m fiwicontrol.fabric bind -o configs/my-fabric.json -c configs/default.ini
- Check freshness — exit 0 only if on-disk fingerprint matches live USB discovery:
python3 -m fiwicontrol.fabric status -f configs/my-fabric.json
- Harness — load that graph (optional
**--strict-fabric-ready** to require**READY** status):
python3 scripts/system/pcie_hotswap_harness.py --fabric-json configs/my-fabric.json --dry-run
Types live under **fiwicontrol.fabric** (**FabricDefinition**, **FabricRRHBinding**, **Fabric.binding_cache_status**).
Concentrator dump (scripts/system/dump_concentrator.py)
Purpose: capture this machine’s concentrator-relevant facts in one place: CPU summary from **/proc/cpuinfo, and (by default) a local host probe — **lspci -tv, **/sys/bus/pci/devices/*/current_link_width** (and related link fields), and **dmidecode -t baseboard** when the binary succeeds (often after **sudo**, because SMBIOS is not always readable as a normal user).
Default output is human text, not JSON: a short CPU block; one line with the total count of sysfs PCI devices that expose negotiated link width/speed; a Wi‑Fi / wireless-only table (**K of N) for PCI class **0x028… (network + wireless) with **w/W** lanes, GT/s current/max, **class**, and a chip column from **lspci -nn** (preferred) or sysfs **vendor** / **device** hex pair (long chip strings are truncated); a peek at the first **--lspci-lines** rows of **lspci -tv** (default 18, remainder summarized); and the first 14 lines of **dmidecode -t baseboard** when that command succeeds (often requires **sudo** on Fedora).
| Flag | Meaning |
|---|---|
**--json** |
Emit the full **ConcentratorPlatformSnapshot.to_json_dict()** document (large): CPU fields, optional **lspci_tree**, compact **pci_device_links** as **{"cols":[...],"rows":[...]}** (columns **bdf**, **w**, **W**, **s**, **S**, **c** = lanes and GT/s tokens and class), optional **dmidecode_baseboard** string. |
**--no-host-probe** |
CPU-only; skip **lspci**, sysfs PCI enumeration, and **dmidecode**. |
**--pci-sysdir DIR** |
Override **/sys/bus/pci/devices** (testing or nonstandard roots). |
**--pci-all** |
After the Wi‑Fi table, append a second table of other “interesting” non-wireless links (wide ports / downgrades), still capped by **--pci-max-rows**. |
**--pci-max-rows N** |
Cap for the optional second table (default 40). |
**--lspci-lines N** |
Lines of **lspci -tv** in human output (0 = omit that block; default 18). |
**--label NAME** |
Shown in the human header only. |
**--proc-cpuinfo PATH** |
Override **/proc/cpuinfo** (tests or chroots). |
Examples:
# Human summary (default); Wi‑Fi table + short lspci tree + DMI if allowed
python3 scripts/system/dump_concentrator.py
# Same with baseboard text (often needs root on Fedora)
sudo python3 scripts/system/dump_concentrator.py
# Machine JSON for tooling / CI artifacts
python3 scripts/system/dump_concentrator.py --json > /tmp/concentrator.json
Python API: **fiwicontrol.concentrator.ConcentratorPlatform, **ConcentratorPlatformSnapshot, **PciDeviceLinkSnapshot**, **format_concentrator_platform_snapshot_human()** (same layout as the script’s default text; optional **lspci_nn_by_bdf=** for tests). Implementation lives in **src/fiwicontrol/concentrator/host.py** (package **fiwicontrol.concentrator** — local workstation facts, parallel to **fiwicontrol.radio** for RRH aggregates; not part of fabric JSON).
When the harness (or your script) loads **--fabric-json**, it merges lab INI by default (same file as **fiwicontrol.lab: **FIWI_LAB_INI, else **configs/default.ini** if present). Pass **--lab-ini PATH** to point at another file. Merged keys include optional **[fabric]** (**fabric_id**, **concentrator** → **[machine.*]** SSH target) and optional **[fabric.rrh.<radio_id>]** to override Acroname port / patch panel / module serial for rows already present in the JSON. Use **--no-lab-ini** to skip. JSON supplies **discovery_fingerprint** and the RRH binding list (key **rrhs**; Python: **FabricDefinition.rrhs**) from **fabric build** / **bind** or **fabric_realize.py --json**.
Acroname discovery smoke test (scripts/system/test_acroname_usb_discovery.py)
Runs BrainStem USB enumeration per [machine.*] row in the lab INI: **usb=local** on the workstation you run from, **usb=remote** over SSH (same interpreter contract as **fiwicontrol.power --discovery-json**). Prints a short table per machine, **brainstem_version** from discovery JSON (with an SSH fallback pip probe when the remote build omits that field), and a total module count across hosts.
python3 scripts/system/test_acroname_usb_discovery.py
python3 scripts/system/test_acroname_usb_discovery.py -c configs/default.ini --json
python3 scripts/system/test_acroname_usb_discovery.py --local-only
Use **--local-only** to skip the INI and probe only this machine’s USB. See **docs/power-control-and-inventory.md** for INI fields.
Wi-Fi PCI chip discovery (scripts/system/discover_wifi_pci.py)
Prints Wi-Fi / MediaTek-looking **lspci -nn** lines with suggested **pcie_bdf** values.
# Just discover chips/BDFs visible right now
python3 scripts/system/discover_wifi_pci.py
# JSON output for tooling
python3 scripts/system/discover_wifi_pci.py --json
Optional: power all RRHs ON first
When radios are currently off, add **--power-all-on** so PCI devices are present before discovery:
python3 scripts/system/discover_wifi_pci.py --power-all-on -f configs/clubhouse-uax-24.json
**--power-all-on** powers both local and relay-host RRH ports (using **-c/--lab-ini** for relay resolution):
python3 scripts/system/discover_wifi_pci.py --power-all-on -f configs/clubhouse-uax-24.json -c configs/default.ini
Safety preview (no toggles):
python3 scripts/system/discover_wifi_pci.py --power-all-on -f configs/clubhouse-uax-24.json -c configs/default.ini --dry-run
**--power-all-on** uses RRH bindings from the fabric JSON (**acroname_module_serial** + **acroname_port**) and drives Acroname ports locally via BrainStem and remotely via SSH (relay hosts from the lab INI).
Interactive RRH -> PCI mapping (scripts/system/assign_rrh_pcie_bdf.py)
Use this when you want to assign **pcie_bdf** values to each **radio_id** and write them to the lab INI:
python3 scripts/system/assign_rrh_pcie_bdf.py --power-on-first -f configs/clubhouse-uax-24.json -c configs/default.ini
Behavior:
- Powers local + relay RRHs on first (same logic as
**discover_wifi_pci.py --power-all-on**). - Lists Wi-Fi PCI candidates from local
**lspci -nn**. - Prompts once per
**radio_id**to pick a candidate index or type a BDF directly. - Writes/updates
**[fabric.rrh.<radio_id>] pcie_bdf = ...**in the INI.
Prompt shortcuts:
**Enter**— keep existing value**-**— clearpcie_bdf**<number>**— select from candidate list**<bdf>**— set explicit value (e.g.**03:00.0**)
Fabric compose + realize (scripts/system/fabric_realize.py --realize)
Loads the lab INI, runs local Acroname discovery, **compose_definition, builds **Fabric, then **await fab.realize()** (strict fingerprint check against live USB). Default stdout is an OK line plus **print(fabric)** (human **Fabric.__str__** summary). Pass **--json** for stdout-only **FabricDefinition** JSON after a successful realize. **-v** adds discovery / pre-realize fabric lines on stderr; **--no-strict** passes **strict=False** into **Fabric.realize()**. **--realize-discovery-timeout SEC** bounds Acroname discovery during **--realize** (default 120). Exit codes and FDIR semantics: **docs/fdir.md** and **fabric_realize.py --help** (epilog).
Without **--realize**, **fabric_realize.py** only composes the definition and prints a human workstation report (or **--json** / **-o** for definition JSON without calling **Fabric.realize()). The human report can merge patch-panel labels into the Wi‑Fi PCIe table when **--patch-panel-json PATH is set or when **<lab_ini_stem>_panel.json** exists beside the lab INI (see **fiwicontrol.fabric.patch_panel_json**).
Prerequisites
- Editable install from the repo root (see
**docs/install.md**):
cd ~/Code/FiWiControl
python3 -m pip install -e ".[dev]"
- Python 3.11+ — the example uses
**asyncio.TaskGroup** and**except* Exception**. - Optional SSH to the rig — same contract as elsewhere: passwordless
**root@<host>** for**sshtype="ssh"**. Optional**FIWI_SSH_CONFIG**is documented in**docs/node-control-asyncio-design.md**. - Power / Acroname — not wired in the example yet. When you add
**fiwicontrol.power, use**pip install -e ".[power]"and follow**docs/power-control-and-inventory.md**.
How to run the example
From the repository root (the script prepends **src** to **sys.path** if needed):
# Safe: no SSH, no hardware — exercises structure only
python3 scripts/system/pcie_hotswap_harness.py --dry-run --paths 2 --iterations 1
# With saved fabric JSON (after build/bind; merge lab INI at run time)
python3 scripts/system/pcie_hotswap_harness.py --fabric-json configs/my-fabric.json --lab-ini configs/default.ini --dry-run
# With SSH smoke on the concentrator (replace IP)
FIWI_REMOTE_IP=192.168.1.39 python3 scripts/system/pcie_hotswap_harness.py --dry-run --paths 2
# or
python3 scripts/system/pcie_hotswap_harness.py --dry-run --paths 2 --rig-ip 192.168.1.39
| Flag | Meaning |
|---|---|
**--fabric-json PATH** |
Load **FabricDefinition** from JSON; sets **Fabric.rrhs** and **rrh_power_ports**. Without it, uses **--paths** placeholders. |
**--lab-ini PATH** |
Lab INI merged after JSON (default: **FIWI_LAB_INI**, else **configs/default.ini** if present). |
**--no-lab-ini** |
Skip INI merge; JSON only. |
**--strict-fabric-ready** |
Exit 2 unless **Fabric.binding_cache_status** is **READY** (requires live Acroname discovery). Only meaningful with **--fabric-json**. |
**--dry-run** |
Log only; no programmable power (none hooked up in this skeleton). |
**--paths N** |
Placeholder RRH count (ignored when **--fabric-json** is set). |
**--iterations M** |
Outer loop: run **M** sequential **TaskGroup** rounds. |
**--settle SEC** |
Sleep between conceptual phases inside **one_cycle**. |
**--rig-ip** |
SSH target; defaults to **FIWI_REMOTE_IP**. Overrides JSON concentrator when set. If unset and JSON has no IP, remote checks are skipped. |
Patterns to reuse in your own harness
1. Thin main() — parse, configure logging, call asyncio.run
Keep I/O policy (flags, env) in **main(). Keep async logic in **async def functions so tests or imports can reuse the coroutines without a second event loop.
2. One coroutine per “story”: one_cycle, run_campaign
Name coroutines after user-visible steps (cycle, campaign, smoke). Pass explicit parameters (dry_run, settle_s, label) instead of hidden globals.
3. Concurrency with TaskGroup
When multiple RRHs run together, **async with asyncio.TaskGroup() as tg:** + **tg.create_task(...)** fails fast and bundles errors in an **ExceptionGroup**. Catch with **except* Exception** at the boundary that owns **asyncio.run**, log each sub-exception, and return a process exit code.
4. Dry-run first
Always provide a path that does not touch hardware so engineers can validate logging, SSH, and timing on a laptop. Real power transitions should be clearly gated (extra flag or explicit “I know this is live”).
5. Domain types from the library
Attach **FrontHaul** to **RadioHead** even when fields are **None** — it documents intent and keeps the harness aligned with production models. Pass a **Fabric** into the async campaign so shared resources (concentrator SSH, bench **Power, **asyncio.Lock, **rrh_power_ports**) have one home. Prefer **--fabric-json** (bound once via **python3 -m fiwicontrol.fabric bind**) over ad hoc placeholders; reserve **--paths** for laptop-only smoke.
6. Remote checks via ssh_node
Use **await node.rexec(cmd="...", ...)** for one-shot remote work. For periodic sampling, prefer **Command** / **CommandManager** from **fiwicontrol.commands** (see **docs/node-control-asyncio-design.md**).
7. Document gaps in the script
A short DESIGN_GAPS or TODO block at the top of the harness documents how enumeration, telemetry, or SPC relate to this script.
Checklist for a new system script
- Lives under
**scripts/system/** with a**#!/usr/bin/env python3** shebang. **argparse**(or equivalent) documents every assumption;**--help**is accurate.**--dry-run**(or equivalent) when hardware is involved.**logging**at INFO for operator visibility; avoid**print**for control flow.- Async entry is
**async def**+ single**asyncio.run(...)**from**main()**. - Concurrent work uses
**TaskGroup**(or**gather**with a documented error policy). - Non-zero exit on failure;
**ExceptionGroup**handled if you use**TaskGroup**. - README or this doc updated if you add a new category of harness or dependency.
Related docs
**docs/pcie-hotswap-setup.md** — PCIe harness prerequisites and JSON generation.**docs/fabric-builder.md** — lab INI +**python3 -m fiwicontrol.fabric build**/**bind**.**docs/install.md**— workstation and rig setup,**pip install -e**.**docs/node-control-asyncio-design.md**—**ssh_node**,**Command**, timeouts, running tests.**docs/power-control-and-inventory.md**— Acroname / Monsoon, INI,**--verify-inventory**.**docs/spc.md**— when campaigns need statistical control charts after KPI extraction.**README.md**—**scripts/system/**vs**tests/**overview.