FiWiControl/docs/architecture.md

120 lines
7.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# FiWiControl — system architecture
This document is the **top-level design** for the **FiWiControl** Python distribution (`fiwicontrol`). It ties together packages, data artifacts, runtime contexts, and how this repo supports the **Umber FiWi** system described in the **architecture spec** (read-only reference: `html/Fi-Wi-L4S.html` in this repository — open in a browser; **do not treat this Markdown file as a substitute for the full spec**).
**Related:** **`docs/fdir.md`** (fault detection, isolation, recovery), **`README.md`** (package list), **`docs/install.md`** (install, security, environment variables).
---
## 1. Purpose and scope
### 1.1 What FiWiControl is
FiWiControl provides **concentrator- and lab-oriented automation**: remote command execution (`ssh` / `ush`), USB power and discovery (Acroname / Monsoon), **lab INI** inventory, **fabric** definitions (RRH ↔ USB port bindings + fingerprint), **fronthaul** and **telemetry** models, optional **iperf** flows and **SPC**, and **system harness scripts** under `scripts/system/`.
It does **not** implement the FiWi **datapath**, WiFi **MAC**, or **L4S queueing** algorithms. Those belong to the system architecture in `html/Fi-Wi-L4S.html` (e.g. dual-loop control, queue structure, dynamic point selection, PCIe fronthaul). FiWiControl **aligns lab and deployment practice** with that architecture by making **inventory, power paths, and fronthaul physical state** observable and scriptable.
### 1.2 What “production ready” means here
**Production ready** for this repo means: **clear contracts**, **predictable failure modes**, **documented exit codes and logs** (see **`docs/fdir.md`**), **tests** for import and core logic, and **explicit non-goals** (e.g. not a medical or life-safety product — **`README.md`**). It does **not** imply formal certification of the FiWi network product; that remains a broader system concern.
---
## 2. Alignment with `html/Fi-Wi-L4S.html` (conceptual map)
The HTML spec covers motivation, dual-loop control, queues, RRH redundancy, dynamic point selection, power/thermal considerations, and **PCIe fronthaul** (including **PCIe hot swap**). FiWiControl maps to those themes **only on the control-plane / operations side**:
| Spec theme (HTML) | FiWiControl role |
|-------------------|------------------|
| **Concentrator vs RRH split** | **`fiwicontrol.fabric`**: concentrator SSH target + RRH list; **`fiwicontrol.radio`**: logical RRH aggregate; **`fiwicontrol.fronthaul`**: link identity. |
| **Fronthaul (PCIe)** | **`telemetry.fronthaul`**, **`scripts/system/pcie_hotswap_harness.py`**, **`docs/pcie-hotswap-setup.md`** — lab exercise and metadata, not wire-protocol implementation. |
| **Inventory / “what is connected”** | **`fiwicontrol.lab`**, **`fiwicontrol.power`**, **`[fabric]` / `[fabric.rrh.*]`** INI, **`FabricDefinition`** JSON, **`discovery_fingerprint`**. |
| **Remote operation of rigs** | **`fiwicontrol.commands`** (`ssh_node`, `Command`, `CommandManager`). |
| **Measurement / campaigns** | **`fiwicontrol.flows`**, **`fiwicontrol.spc`**, harness scripts. |
If a capability appears in the HTML spec but **not** in this table, assume it is **out of scope** for FiWiControl until a package or script explicitly addresses it.
---
## 3. Layered architecture
```text
┌─────────────────────────────────────────────────────────────┐
│ scripts/system/* (harnesses, CLIs — not default pytest) │
└───────────────────────────┬─────────────────────────────────┘
│ import
┌───────────────────────────▼─────────────────────────────────┐
│ fiwicontrol.fabric | .power | .commands | .flows | .spc … │
└───────────────────────────┬─────────────────────────────────┘
┌───────────────────┼───────────────────┐
▼ ▼ ▼
fiwicontrol.lab fiwicontrol.radio fiwicontrol.telemetry
(INI, discovery) (RadioHead, (schemas, e.g.
FrontHaul) FrontHaulTelemetry)
│ │
└─────────┬─────────┘
fiwicontrol.commands (SSH / ush — fiwicontrol.power may use for remote discovery)
```
### 3.1 Dependency rules (hard)
- **`fiwicontrol.commands`** must **not** import **`fiwicontrol.power`** (breaks layering).
- **`fiwicontrol.lab`** may use **`commands`** only for **SSH-assisted discovery** where needed.
- **`fiwicontrol.power`** may import **`commands`** and **`lab`**; the reverse is forbidden.
These rules keep **optional** hardware stacks (BrainStem, etc.) from becoming core dependencies of the command layer.
---
## 4. Persistent artifacts
| Artifact | Typical location | Role |
|----------|------------------|------|
| Lab INI | `configs/*.ini`, `FIWI_LAB_INI` | Machines, USB expectations, `[fabric]` / `[fabric.rrh.*]`. |
| Fabric JSON | Operator-chosen path (e.g. `configs/my-fabric.json`) | `FabricDefinition`: `rrhs`, `discovery_fingerprint`, concentrator fields. |
| Patch panel map | `*_panel.json` beside INI or `--patch-panel-json` | BDF → label for concentrator reports. |
**Merge rule:** Harnesses and APIs often **load JSON first**, then **merge** the same lab INI so operators can override concentrator or per-RRH fields without editing JSON (`ini_merge`).
---
## 5. Runtime contexts
1. **Developer workstation** — editable install, pytest, `fabric build`, `fabric_realize.py`.
2. **Umber concentrator** — intended deployment host for automation aligned with the spec.
3. **Lab rig (e.g. Pi)** — SSH target; may run remote discovery for `usb=remote` rows.
Passwordless **`root@<host>`** SSH (or equivalent) is assumed for automation; see **`docs/node-control-asyncio-design.md`**.
---
## 6. Observability and FDIR
- **Structured fabric checks:** `Fabric.binding_cache_status`, `python3 -m fiwicontrol.fabric status`, `Fabric.realize(strict=…)`.
- **Logging:** FDIR-related messages use the prefix **`[FiWi-FDIR]`** (see **`fiwicontrol.fabric.fdir`** and **`docs/fdir.md`**).
- **CLI exit codes:** `fabric_realize.py` documents codes in **`--help`** epilog; full table in **`docs/fdir.md`**.
---
## 7. Documentation index
| Document | Contents |
|----------|----------|
| **`docs/fdir.md`** | Fault detection, isolation, recovery; exit codes; operator responses. |
| **`docs/install.md`** | Install, environment variables, security boundaries. |
| **`docs/fabric-builder.md`** | Interactive bind → JSON. |
| **`docs/power-control-and-inventory.md`** | INI reference, verification. |
| **`docs/node-control-asyncio-design.md`** | `ssh_node`, asyncio, timeouts. |
| **`docs/system-test-scripts.md`** | Harness patterns, `fabric_realize` summary. |
| **`docs/pcie-hotswap-setup.md`** | PCIe harness setup. |
| **`docs/flows.md`**, **`docs/spc.md`** | Optional packages. |
| **`html/Fi-Wi-L4S.html`** | **FiWi system architecture** (authoritative product/spec narrative). |
---
## 8. Revision
When packages or scripts gain new **failure modes** or **exit codes**, update **`docs/fdir.md`** and this files **§6** / **§7** as needed. When the FiWi **product architecture** changes, update **`html/Fi-Wi-L4S.html`** through your normal spec process (FiWiControl docs only **reference** it, per project policy).