Umber Fi-Wi Architecture: Cellularized Wi-Fi with Dynamic Point Selection

With 23.3 billion Wi-Fi devices in use worldwide and 5.5 billion people depending on internet connectivity, and growing, Wi-Fi has become the primary way we access the internet. So much so many people think Wi-Fi is the internet. It's how a home healthcare worker video-calls to check on a patient, or a cancer patient connects to their support group. It’s how a parent works remotely while their child attends school online, and how lifelong learners access the information they need to grow. It’s how a grandmother monitors her heart condition through a telehealth app. It’s how a family member finds their next job, or how a neighbor orders a meal.

Running quietly in the background are autonomous systems we've come to depend on: security cameras that alert us to threats, medical monitors that track vital signs, smart home systems that manage climate and safety, IoT sensors that detect water leaks or carbon monoxide. These systems don't wait for us to notice problems—they operate continuously, silently, keeping people safe.

We've moved far beyond entertainment and convenience. Wi-Fi now carries the infrastructure of daily survival. When it breaks down under density or congestion, it's not just buffering that fails. It's jobs, healthcare access, human connection, and the life-safety systems we trust to work when we're not watching. The $4.9 trillion Wi-Fi contributes to the global economy isn't an abstract number. It's the cumulative value of billions of human activities and critical systems that simply stop working when the network fails.

Why Traditional Wi-Fi Cannot Support L4S

The infrastructure supporting all of this is failing at scale, and it must be addressed for all. The industry is moving toward L4S and ECN-based control to eliminate bufferbloat, but traditional Wi-Fi makes this impossible. Legacy congestion-control loops fail by design once a single flow saturates the bottleneck queue, and even modern ECN-based systems such as L4S cannot converge when Wi-Fi hides queue depth, induces collision storms, injects firmware-created delays that look like queues, and constantly shifts transmission (PHY) rates through its rate-control and aggregation machinery. Mesh networks and more APs catalyze intolerable user experiences by injecting more uncoordinated radios into an already chaotic RF environment. And because the AP industry understands these limits, it is no surprise that even major vendors publicly state that L4S cannot operate correctly over the products they sell.

Adding more Ethernet-attached APs makes it worse by creating more overlapping contention domains. Hidden queues in SoCs, rate-control firmware, and aggregation pipelines obscure the true bottleneck. In control-theory terms: the bottleneck queue cannot expose its state, the PHY rate is not stationary, and the closed loop cannot stabilize. This is why user experience fails in many apartments and homes, in hotels, MDUs, stadiums, and high-density buildings long before “capacity” is reached.

QoS cannot rescue this architecture. Because the bottleneck queue inside a Wi-Fi AP has no information about actual flow urgency or priority, no QoS mechanism can operate meaningfully. The only real solution is to avoid congestion altogether — which is exactly what L4S researchers have designed for and exactly what Fi-Wi supports.

Why Copper Infrastructure Has Reached Its Limits

While the protocol fails in the air, the physical infrastructure fails in the walls - the industry’s traditional answer of running copper Ethernet to APs — simply extends the lifetime of an architecture that has reached its limits. Copper requires periodic rip-and-replace cycles: Cat5 becomes Cat6, then Cat7, then Cat8. A home builder has no idea what communications wiring to install. The RJ45 connector and its plastic tab is fragile, outdated and end of life. And at 25G, 40G, or 100G, physics takes over: copper loses signal in dB per inch. Data centers have abandoned structured cabling (long-run copper) for core transport, restricting copper only to short-reach intra-rack DACs. Fi-Wi applies this same logic to the building: Fiber for the long haul (halls/walls), radio for the short hop.

How Fi-Wi Breaks Both Cycles

Fi-Wi breaks the cycle. Install fiber once — and never revisit behind walls or ceilings again. The glass is permanent; only the optics evolve. Fiber is already the universal medium for 100G/400G data centers, DWDM long-haul transport, and now PCIe throughout a building with Fi-Wi. Remote Radio Heads simply convert between fiber and 802.11, eliminating embedded routing, rate-control SoCs, switching silicon, and the security-patch treadmill they require. When Wi-Fi standards evolve, you replace the small radio module(s) — that's all.

Fi-Wi turns fiber combined with 802.11 into the permanent, predictable, control-theory-friendly transport that the L4S control loop requires, and treats 802.11 radio heads as the small, disposable, last-meters, connector-free interface where the in-building network behaves deterministically. And because fiber increases the long-term value of a building, the investment is not just technically durable — it is financially durable.

The Opportunity Is Here

There is no law of physics that says Wi-Fi cannot work at scale. The collapse we're seeing in apartments, hotels, and high-density buildings isn't inevitable. The researchers have shown engineers how to proceed. We know how to build stable control loops. We know how to coordinate radios. We know how to deploy permanent infrastructure.

The conditions for solving this are here, now. Engineering talent exists across our industry. The market has already validated the foundation: China's FTTR deployments have installed fiber to millions of rooms, proving that permanent infrastructure at this scale is not just feasible—it's already happening at volume. What's missing is capital directed at the right architecture. Investors are essential to this challenge. Their capital will enable the engineering to serve the market. And, once proven, market signals will sustain the development, directing human resources toward building what humanity needs for continued advancement.

Fi-Wi is Umber's answer, but the underlying challenge belongs to all of us. The 5.5 billion people depending on this infrastructure deserve better than a system designed for convenience that we've repurposed for survival. This is solvable engineering—the talent is ready, the manufacturing exists, and the market is waiting. It's time we came together and fixed this.

2. The Wi-Fi Crisis: Why Evolution Failed and Control Was Lost

The failure of modern Wi-Fi to support low-latency applications (L4S) is not a failure of bandwidth; it is a failure of control. With 23.5 billion Wi-Fi devices deployed globally, the protocol has hit an asymptotic limit where adding complexity yields diminishing returns.

As density rises, autonomous contention scales super-linearly—effectively operating as the inverse of Metcalfe's Law. The result is a rising noise floor and media access collisions that render unlicensed spectrum unusable for the deterministic performance required by next-generation applications.

2.1 The Evolutionary Trap: Why Incremental Improvements Failed

Evolutionary engineering is powerful; it gave us twenty-five years of Wi-Fi speed improvements. But every evolutionary curve eventually hits an asymptote—a point where adding more complexity yields diminishing returns. We have reached that point.

The decision matrix for a Wi-Fi chip has exploded combinatorially. We can trace this through the Modulation and Coding Scheme (MCS) Table:

The Physical Trap: When the firmware engineer fails to optimize the radio, can we simply redesign the chip? No, because of RTL (Register Transfer Level) Accretion. In software, engineers "refactor" unwieldy code. In hardware, refactoring is economically forbidden. A complex SoC takes 18–24 months to validate; removing "dead" logic risks breaking obscure corner cases. Consequently, vendors only add; they never subtract. 802.11be logic wraps around 802.11ax logic, which wraps around 802.11ac logic—twenty-five years of accumulated technical debt consuming area and leakage power.

The Market Signal: The ultimate proof that the standard has reached gridlock is the behavior of market leaders like Samsung and Apple. They no longer rush to support every new feature—they aggressively whitelist features and blacklist others because complexity drains battery and destabilizes connections. When the two largest consumers of wireless silicon effectively stop buying the complexity argument, the evolutionary roadmap is broken.

2.2 The Density Paradox: More Capacity, Less Performance

The fundamental instability of 802.11 stems from the Birthday Paradox applied to media access. In an autonomous system, as the number of contending stations (n) increases linearly, the probability of collision increases combinatorially:

Simulation data confirms that even with moderate client density, collision probability quickly exceeds 50%, forcing the network into a state of "Drift" where latency becomes unbounded. Under these conditions, the network is no longer constrained by PHY capacity, but by the probability of successful media access.

This is Metcalfe's Law in reverse: instead of each new node increasing the value of the network, each new node increases the chance of interference and reduces usable capacity.

2.3 The Three Technical Failure Modes

The collapse of the operator model is driven by three distinct architectural failures inherent to the 802.11 standard.

2.3.1 Protocol Tax: The Hidden Node Penalty

Standard Wi-Fi relies on Carrier Sense Multiple Access (CSMA), which assumes that all stations can hear each other. In real-world MDU (Multi-Dwelling Unit) environments, this assumption fails catastrophically.

Field measurements using ESP32-based sensors reveal that hidden node contention consumes 30-50% of available airtime in typical MDU deployments—airtime paid for in spectrum acquisition costs but lost to protocol overhead invisible to traditional monitoring. This represents a massive protocol tax where significant airtime is consumed by retries and backoff slots rather than payload delivery.

2.3.2 The MCS Matrix: Un-Engineerable Complexity

The most critical failure for a network operator is the loss of state control. Modern 802.11ax supports 12 MCS indices × 4 bandwidth options × 8 spatial stream configurations × 3 guard intervals = >1,000 valid PHY states. Autonomous rate selection must navigate this space at sub-millisecond timescales under non-stationary noise.

Because Wi-Fi is non-stationary, autonomous rate selection under contention has no bounded outcome. The IEEE 802.11 standard has allowed the MCS table to explode into hundreds of valid permutations—a chaotic state space that firmware must navigate in microseconds with incomplete information.

2.3.3 The Spatial Contention Cascade

As load increases, the spatial precision of the network degrades. Mathematical modeling shows that the condition number (κ)—a measure of how well-conditioned the MIMO channel matrix is—degrades from 6 dB (excellent spatial separation) to >12 dB (severe interference) under load. This collapse means that 4×4 MIMO effectively degrades to 2×2 or worse, turning additional spatial streams into self-interference rather than capacity.

This degradation collapses the theoretical gains of Mu-MIMO, transforming high-order spatial streams into interference rather than usable capacity. The "Efficiency Paradox" emerges: Wi-Fi evolution has focused on shrinking Payload Duration (faster PHY rates like 4096-QAM) while MAC Overhead (LBT, Backoff, Preamble) remains constant. To amortize the overhead, chips must build massive Aggregates (A-MPDUs). This destroys latency. We have engineered a Ferrari engine (the PHY) inside a garbage truck (the MAC).

2.4 The Operator's Dilemma

For network operators—whether cable MSOs, telcos, or fiber providers—this architectural chaos presents a fundamental business risk: You own the customer experience, but not the air interface.

2.5 Why Conventional Solutions Don't Scale

Traditional attempts to solve Wi-Fi density problems fail because they address symptoms rather than the underlying architectural failure:

The Trillion-Dollar Context: The mobile industry spent $600 billion building 5G to get scheduled, deterministic performance outdoors. They understand that unlicensed spectrum + autonomous contention = chaos. The genius of 5G is its architecture; its Achilles heel is its cost. In recent auctions, 20 MHz of licensed mid-band spectrum sold for over $17 billion for U.S. rights alone.

Fi-Wi applies the cellular C-RAN architecture indoors—but on unlicensed spectrum that costs nothing. This is the arbitrage opportunity.

2.6 The Client Side: L4S and the End of Uplink Contention

The architectural reset is not limited to the infrastructure; it fundamentally alters the behavior of the Station (STA). In legacy Wi-Fi, the STA is an autonomous agent that fights for upstream airtime using EDCA (Enhanced Distributed Channel Access). It maintains its own local WMM queues and blindly transmits whenever it wins a contention window, often oblivious to the fact that the AP's receive buffer is already full.

The L4S Inversion: With L4S, the "Quality of Service" decision moves from the Wi-Fi card's firmware to the application's congestion control algorithm. We replace the rigid, static categories of WMM with the dynamic, adaptive responsiveness of TCP Prague and other L4S-compliant congestion controls.

Eliminating the "Uplink Queue": This effectively virtualizes the queue. Instead of a deep buffer sitting on the Wi-Fi chip waiting to be transmitted, the packets are held in user-space memory on the client device, waiting for the "go" signal (or rather, the absence of a "stop" signal). The traffic never enters the contention domain until there is guaranteed capacity to service it. The STA no longer needs complex internal QoS schedulers because it is no longer trying to force more data than the pipe can hold.

2.7 The Strategic Reset: Splitting the Graph

Solving this requires a "Subtractive Architecture." Instead of adding more features to the radio, we must remove them. The architectural breakthrough of Fi-Wi is decoupling the MCS State Graph described in Section 2.3.2 into its constituent parts:

This architectural shift—from distributed chaos to centralized control—mirrors the evolution from analog transmission systems (noise-prone, operator-invisible) to digital QAM (deterministic, monitorable). Fi-Wi completes this transformation for the last 10 meters, moving the network from a model of probabilistic negotiation to one of deterministic execution.

Section 13 describes the Concentrator's scheduling algorithm that implements this graph traversal, while Appendix C details the RRH's scatter-gather DMA mechanism that executes the chosen state transitions at microsecond timescales.

2.8 Interactive Visualization: The MCS Collapse Under Load

The following interactive simulation demonstrates the architectural differences between Fi-Wi, autonomous APs, and mesh networks under varying load conditions. It visualizes the MCS State Graph discussed in Section 2.7, showing how autonomous systems fail to navigate this state space under density.

Each "room" represents a device with a 4 × 12 grid of MCS states (4 spatial streams × 12 MCS indices). The ghost node (dashed) shows the ideal state based on channel quality, while the active node shows the actual state selected by the rate control algorithm.

Technical Details: Understanding the Visualization

MCS Grid: Each 4×12 grid shows all possible MCS states. Top rows = Mu-MIMO (multi-user), bottom rows = standard 2×2 MIMO. Columns = MCS index (0-11, higher = faster but needs better SNR).

Eigenvalues (λ₁, λ₂): Strength of spatial modes in the MIMO channel. As density increases in autonomous mode, λ₂ collapses → spatial interference.

Condition Number (κ): Ratio λ₁/λ₂ in dB. Low (~6 dB) = good. High (>12 dB) = Mu-MIMO degraded to single-stream. This directly demonstrates the "Spatial Contention Cascade" from Section 2.3.3.

Collision Probability: Computed using Birthday Paradox formula: n(n-1)/2 collision pairs. When this exceeds 50%, the network enters "Drift" state with unbounded latency.

Why This Matters for Network Operators

This visualization proves the loss of control described in Section 2.4. In autonomous mode, operators cannot engineer performance because the system navigates a 1,000+ state MCS graph with no global coordination.

The result: predictable, engineerable performance that scales with density instead of collapsing. The difference becomes visceral when you watch autonomous mode turn red under the same load that Fi-Wi handles in green.

3. System Picture

3.1 Classical Stack vs. Fi-Wi (The C-RAN Shift)

In a typical controller-managed enterprise Wi-Fi deployment, a centralized controller (e.g., Cisco WLC, Aruba Mobility Controller, Ubiquiti UniFi Controller) coordinates AP configuration: channel assignment, transmit power, client steering recommendations, and SSID management. However, each AP remains autonomous at the data plane:

These systems are loosely-coupled: the controller manages the control plane (configuration, policy) but the data plane — queuing, MAC scheduling, aggregation, and packet forwarding — remains distributed and autonomous across individual APs.

In Umber Fi-Wi (C-RAN for Wi-Fi), we split the AP and cellularize the RF domain, down to room-level. The concentrator sees all flows, all queues, and all RRHs. The RRHs handle 802.11 MAC/PHY but are tightly time-synchronized and behave as DMA-driven PHY/MAC endpoints rather than autonomous APs. A set of RRHs and their shared queues form a cellularized Wi-Fi domain within the building, often at “cell per room” granularity.

Fi-Wi centralizes both control plane AND data plane with shared state across all RRHs. The concentrator doesn't just configure RRHs; it directly manages their queues, schedules their TXOPs, and maintains unified timestamp-synchronized state across the entire cellularized RF domain.

3.2 Dual-Loop Control Model

Conceptually, Fi-Wi decouples the system into two nested feedback loops, separated by timescale:

The Outer Loop manages congestion and end-to-end latency (Internet speed). The Inner Loop manages MAC efficiency and radio timing (Airtime).

The Problem with Legacy Wi-Fi: Traditional APs couple these loops unpredictably, creating "sawtooth" latency patterns that confuse TCP.

The Fi-Wi Solution: By centralizing both loops in the Concentrator, Fi-Wi enforces a strict Time-Scale Separation. The Inner Loop runs so fast (3–5 kHz) that it appears as "constant service" to the slower Outer Loop (10–20 Hz), allowing L4S to stabilize perfectly.

4. Key Fi-Wi Mechanisms

4.1 Time Synchronization

Fi-Wi operates across two distinct time domains simultaneously. The first is the concentrator's internal master clock, disciplined via PTP/802.1AS over the PCIe fronthaul (detailed in Section 4.7). The second is the 802.11 TSF (Target Sync Function) domain that 802.11 clients use to coordinate with the MAC layer. In a traditional AP these two clocks are decoupled — the AP runs one TSF and one clock. In Fi-Wi, with 24 RRHs each presenting a TSF-aware BSS, managing the relationship between them is a foundational architectural responsibility of the concentrator.

4.1.1 The Fronthaul Clock: PTP/802.1AS

The concentrator synchronizes its master clock to all attached RRHs on the order of microseconds (and substantially tighter when using PCIe-native timing mechanisms such as PTM — see Section 4.7 for the full hardware chain). This master clock gives every packet:

This clock lives entirely inside the Fi-Wi domain. Clients never see it directly. It is the coordinate system in which shim header timestamps (Section 4.2), AQM marking decisions (Section 4.3), and the ML training corpus (Section 15) are all expressed. Because all packet timestamps, service events, and queue measurements are expressed in this single master time domain, Fi-Wi can compute precise per-packet sojourn times independent of the TSF domain, enabling stable ECN marking and L4S control across the system.

4.1.2 The 802.11 TSF Domain

The 802.11 TSF is a 64-bit microsecond counter that every client associates with a BSS. Clients set their local TSF from beacons. They use it to wake from power save at the right moment, to interpret TBTT (Target Beacon Transmission Time), and to coordinate TXOP timing. The TSF is the only MAC-visible clock the 802.11 standard exposes at the MAC layer.

In a traditional single-AP deployment this is trivial: one AP, one TSF, one beacon stream. In Fi-Wi it is not. Consider a client in a room served by two RRHs in the same airtime domain. That client will receive beacons from both RRHs. If those beacons carry inconsistent TSF values, even small inconsistencies can lead to misaligned power-save wakeups, ambiguous TBTT interpretation, and in some implementations degraded performance or reassociation. The coherence of the TSF domain across all RRHs in a BSS is not optional; it is a hard correctness requirement.

Fi-Wi satisfies this requirement by construction: the concentrator generates all beacon frames. No RRH constructs its own beacon. The concentrator writes the TSF value into every beacon before dispatching it to the appropriate RRH for transmission. Because all TSF values originate from the same source and are derived from the same master clock, they are consistent by design rather than by coordination protocol. Within a given BSS, TSF values are identical across all participating RRHs; multiple TSF domains arise only when multiple BSS instances are present.

4.1.3 The Concentrator as Time Origin

The concentrator maintains 25 simultaneous time references: its own PTP-disciplined master clock and one 802.11 TSF per RRH. Each TSF has its own epoch (established at BSS creation) and its own drift correction term, derived from periodic synchronization updates over the fronthaul (PTP/802.1AS or PCIe PTM), which bound long-term drift. The concentrator knows the exact affine mapping between the master clock and every client-visible TSF domain at all times:

Any event — a packet enqueue, an ECN mark, a TXOP start, a beacon transmission — can be expressed in any of the 25 frames without loss of precision. This is the time-domain analog of a coordinate transformation: the concentrator is the origin from which all other reference frames are derived, and any event timestamp can be mapped between frames via a known, invertible affine transform, updated continuously via the fronthaul synchronization loop.

This unified time model also enables the concentrator to schedule transmissions across RRHs against a single global timeline, rather than relying on independent per-RRH contention processes. TSF continuity across RRH handoffs is a direct consequence of centralized beacon generation, and it is what makes Fi-Wi's active redundancy claims in Section 8 operationally credible: per-packet steering between RRHs is transparent to clients because the client's MAC-layer time reference never changes. This unified time model enables not only precise measurement, but coordinated control of transmission behavior across RRHs, as described in Section 4.1.4.

4.1.4 Time-Driven EDCA Orchestration

The unified time model described above is not only a measurement framework; it is the foundation for Fi-Wi's centralized MAC scheduling. In conventional 802.11 deployments, EDCA (Enhanced Distributed Channel Access) operates as a stochastic contention mechanism: each AP independently selects random backoff values within its CWmin/CWmax range, and medium access emerges probabilistically.

In Fi-Wi, EDCA is not treated as a distributed random process. It is treated as a centrally orchestrated actuation layer, driven by the concentrator's master time reference.

it can shape medium access behavior across RRHs by dynamically controlling EDCA parameters on a per-radio basis. The key parameters are:

By assigning narrowly bounded contention windows and staggered AIFS values across RRHs, the concentrator can bias contention outcomes such that one RRH is overwhelmingly likely to win access at a given moment. Rotating these parameters over time creates a soft time-division multiplexing (TDM) effect using standard EDCA semantics.

This transformation is only possible because all RRHs share a common time reference. The concentrator can schedule EDCA parameter updates relative to the master clock and ensure that all RRHs apply them in a coordinated manner. Without this shared time base, independent EDCA processes would quickly decorrelate and revert to stochastic contention.

The result is not strict TDMA — 802.11 contention semantics are preserved and the system remains compliant with standard client behavior — but the distribution of outcomes is shaped by the concentrator. Over short time horizons, access becomes highly predictable and service intervals can be bounded. This has two critical consequences:

Because TSF values are consistent across RRHs, these scheduling decisions are MAC-transparent to clients. From the client's perspective, the network behaves as a single, coherent AP with stable timing characteristics, even as transmissions are steered across multiple physical radios.

This time-driven EDCA orchestration is the mechanism by which Fi-Wi converts the inherently stochastic 802.11 MAC into a predictable, centrally scheduled system — completing the chain from time synchronization through queue observability to stable L4S control.

4.2 Fi-Wi Shim Header

Between 802.3/IP and the fronthaul link we add a small internal metadata header. Conceptual form:

4.3 AQM / L4S Marking Placement

We choose the group queues in the concentrator—each corresponding to a cellularized airtime domain shared by one RRH or by multiple interfering RRHs—as the only places where deep queues are allowed and where we apply ECN:

Other queues (within RRH hardware, on the fiber/fronthaul link) are kept shallow via pacing and controlled descriptor posting. The group queues become the single bottlenecks in each cellularized airtime domain, which is exactly what L4S wants: a small number of stable, well-behaved bottlenecks with known behavior. The control policy is explicitly tuned to keep both average and tail queueing delay low.

4.4 Centralized Packet Memory and DMA

4.5 RRH Edge Control via Beacon Power Shaping

Component	Traditional AP	Fi-Wi RRH
MAC/PHY Silicon (802.11 Radio Logic)	~15-20M gates MIMO, error correction, etc. Complexity dictated by physics	~15-20M gates Same physics, same complexity No savings here
Host SoC / CPU (The "Brains")	~50-100M gates Multi-core ARM CPU DDR4 controller Peripherals, caches, etc.	~100K-500K gates Simple DMA state machine Descriptor buffer only 100-1000x simpler
DRAM	256MB - 1GB DDR4 (Required for OS + buffers)	16-64KB SRAM (Descriptor storage only)
Operating System	Linux (millions of LOC) Requires security patches	None Zero software attack surface
Total Silicon	~70-120M gates	~15-20M gates

Because the Fi-Wi concentrator maintains shared state for the entire RF domain, it can directly control the RF footprint of each RRH by adjusting per-RRH beacon transmit power. This alters:

Beacon power is one of the most effective tools for dynamic RF cell shaping because it affects STA association and roaming decisions without modifying data-plane PHY rates. By lowering beacon power at certain RRHs and raising it at others, the concentrator can:

Traditional controller+AP systems attempt similar behavior but lack true shared state because each AP maintains its own queueing and PHY decisions. In Fi-Wi, beacon shaping is coordinated with:

This makes beacon power a first-class control variable in defining and stabilizing the boundaries of each cellularized RF domain.

4.6 Fronthaul Requirements and Feasibility

The Fi-Wi architecture requires deterministic, low-latency fronthaul links between the concentrator and RRHs. Because RRHs function as DMA engines accessing centralized packet memory (Section 4.4), Umber's implementation uses PCIe (PCI Express) over fiber rather than Ethernet. This section quantifies bandwidth, latency, and jitter requirements, and demonstrates that PCIe over fiber not only meets these requirements but provides superior performance compared to network-based alternatives.

4.6.1 Why PCIe Over Fiber?

The choice of PCIe over fiber instead of Ethernet is driven by the Fi-Wi architectural model:

RRHs as DMA engines: Each RRH directly reads packet descriptors from concentrator DRAM, fetches packet data, and writes received packets back to memory. This is native PCIe behavior—exactly how a network card or storage controller operates.

Determinism: PCIe provides guaranteed bandwidth allocation and predictable latency through:

Simplicity: The RRH sees the concentrator's memory space directly. No protocol translation, no socket APIs, no network configuration.

4.6.2 PCIe Bandwidth Requirements

where OH_desc accounts for DMA descriptors, metadata, and PCIe TLP (Transaction Layer Packet) overhead (typically 10-20%).

Example: For C_eff = 600 Mbps (typical 802.11ax 2×2 MIMO) with OH_desc = 0.15:

Typically symmetric or slightly higher than downlink due to ACKs and control frames:

Channel State Information and MAC statistics are written to concentrator memory via PCIe:

4.6.3 PCIe Link Configuration

PCIe Gen	Per-Lane Rate	x1 Link	x4 Link	x8 Link
Gen 3	~8 GT/s	~985 MB/s (7.88 Gbps)	~3.94 GB/s (31.5 Gbps)	~7.88 GB/s (63 Gbps)
Gen 4	~16 GT/s	~1.97 GB/s (15.75 Gbps)	~7.88 GB/s (63 Gbps)	~15.75 GB/s (126 Gbps)
Gen 5	~32 GT/s	~3.94 GB/s (31.5 Gbps)	~15.75 GB/s (126 Gbps)	~31.5 GB/s (252 Gbps)

Note: Effective bandwidth accounts for 128b/130b encoding (Gen 3+) and protocol overhead.

4.6.4 Concentrator PCIe Topology

The concentrator must aggregate multiple RRH connections. Consider a 50-RRH deployment:

4.6.5 PCIe Over Fiber: Physical Layer

Standard PCIe uses copper traces on motherboards (limited to ~30cm at Gen 3/4 speeds). To reach RRHs distributed throughout a building, PCIe signals are carried over fiber using optical transceivers.

Recommended approach for Fi-Wi: Optical PCIe adapter cards with standard fiber infrastructure, providing flexibility and leveraging commodity fiber installation.

4.6.6 Latency Analysis

Component	Latency
PCIe TLP formation (concentrator)	0.2-0.5 µs
Optical transceiver (TX)	0.1-0.3 µs
Fiber propagation (100m)	0.5 µs
Optical transceiver (RX)	0.1-0.3 µs
PCIe TLP processing (RRH)	0.2-0.5 µs
PCIe switch (if used)	0.1-0.3 µs per hop
Total one-way	1.2-2.4 µs
Round-trip (DMA read)	2.4-4.8 µs

Fronthaul Type	Round-Trip Latency	Determinism
PCIe over fiber	2.4-4.8 µs	Excellent (credit-based)
10GbE (cut-through)	10-30 µs	Good (with QoS)
10GbE (store-forward)	20-100 µs	Fair (subject to congestion)

PCIe over fiber provides 5-10× lower latency than even optimized Ethernet, which is critical for the inner control loop (Section B) operating at 200-500 µs timescales.

4.6.7 Jitter and Determinism

PCIe's credit-based flow control eliminates congestion drops and provides deterministic latency:

Measured jitter: PCIe over fiber typically exhibits <50 ns jitter, well under the 200 ns budget for 1 µs time synchronization (Section 4.1).

This determinism is impossible to achieve with Ethernet without time-sensitive networking (TSN) extensions, which add complexity and cost.

4.6.8 Distance Limitations

PCIe Gen	Multi-Mode Fiber	Single-Mode Fiber
Gen 3 (8 GT/s)	300 m	10 km
Gen 4 (16 GT/s)	100 m	2-10 km
Gen 5 (32 GT/s)	50-100 m	2 km

Fi-Wi requirement: Building-scale deployments require ≤100 m reach, easily achieved with Gen 3/4 over multi-mode fiber or any generation over single-mode fiber.

4.6.9 Cost Analysis

PCIe over fiber costs moderately more than standard Ethernet but delivers 5-10× better latency and superior determinism. For Fi-Wi's DMA-based architecture, this cost is justified by the performance and architectural simplicity gains.

For context: a typical enterprise AP costs $500-2000, and a cellular small cell costs $1000-5000. The fronthaul cost is comparable to or less than the radio cost difference, making it economically viable.

4.6.10 Alternative: Hybrid PCIe + Ethernet

For deployments where PCIe over fiber infrastructure is unavailable, a hybrid approach is possible:

This reduces PCIe bandwidth requirements (only packet data, not CSI/control) and allows leveraging existing Ethernet infrastructure for non-latency-critical traffic.

However, the pure PCIe approach is architecturally cleaner and avoids the complexity of dual-protocol RRH implementation.

4.6.11 Comparison to Cellular Fronthaul Standards

Fi-Wi's functional split and PCIe transport provides a unique balance: lower bandwidth than CPRI, lower latency than eCPRI, and native integration with the DMA-based architecture.

4.6.12 Summary: PCIe Over Fiber Enables Fi-Wi Architecture

The deterministic, sub-5-microsecond fronthaul is what enables Fi-Wi's centralized control, time synchronization, and single-bottleneck queueing architecture. Unlike Wi-Fi mesh, controller-based systems with over-the-air backhaul, or even Ethernet-based approaches, PCIe over fiber provides the predictable substrate needed for the control loops described in Appendices A and B to operate with the precision required for sub-millisecond tail latency control.

Component	Cost (approx.)
RRH-side PCIe optical adapter	$150-300
Fiber pair (50m installed)	$50-100
Optical transceiver pair	$50-100
PCIe switch port allocation	$100-200
Total per RRH	$350-700

Approach	Cost per RRH	Latency	Determinism
PCIe over fiber	$350-700	2-5 µs	Excellent
10GbE + TSN	$300-600	10-30 µs	Good
Standard 10GbE	$200-400	20-100 µs	Fair

Requirement	Target	Achieved with PCIe Gen 3 x1
Bandwidth per RRH	~1.5 Gbps	✓ 7.88 Gbps (5× margin)
Aggregate (50 RRH)	~30 Gbps avg	✓ PCIe switch or multi-CPU
Round-trip latency	<10 µs	✓ 2.4-4.8 µs
Jitter	<200 ns	✓ <50 ns (credit-based)
Distance	≤100 m	✓ 300m MM / 10km SM
Determinism	No drops, predictable	✓ Credit-based flow control
Cost per RRH	<$700	✓ $350-700

4.7 Precision Clock Synchronization over Fronthaul

The "cellularization" of Wi-Fi relies on a unified timebase. In the Fi-Wi architecture, time is not merely used for logging; it is a control variable. To achieve coordinated scheduling, accurate queue measurements, and seamless mobility, every RRH must share a precise understanding of "now" down to the microsecond level.

To achieve this, Fi-Wi establishes a strict Hierarchical Clock Tree over the PCIe fronthaul, leveraging the native determinism of the bus rather than the best-effort nature of packet switching.

4.7.1 The Concentrator as Grandmaster (GM)

The Fi-Wi Concentrator acts as the PTP Grandmaster (IEEE 1588v2 / 802.1AS) for the entire building. It houses the primary reference oscillator (typically a high-stability OCXO).

Downstream Distribution: The Concentrator distributes the master clock reference to all connected RRHs via the fronthaul links.
PCIe Precision Time Measurement (PTM): Over the PCIe-over-fiber links, Fi-Wi leverages PCIe PTM (a hardware-native feature since PCIe Gen 3) to continuously calculate and compensate for propagation delay variations in the fiber.
RRH Frequency Lock: Each RRH maintains a local clock disciplined to the Concentrator's master clock, ensuring frequency stability and preventing long-term drift.

Diagram 4-2: The Fi-Wi Clock Tree Topology

          External Reference (Optional GPS/GNSS)
                       │
                       ▼
    ┌──────────────────────────────────────────────┐
    │            Fi-Wi Concentrator                │
    │     [ High-Stability Ocillator (OCXO) ]     │ ◄── Grandmaster (GM)
    │           (System Timebase t0)               │
    └──────────────────┬───────────────────────────┘
                       │ PCIe PTM / Hardware Sync
                       │ (Compensates for fiber flight time)
          ┌────────────┼─────────────┐
          ▼            ▼             ▼
    ┌───────────┐ ┌───────────┐ ┌───────────┐
    │   RRH 1   │ │   RRH 2   │ │   RRH 3   │      ◄── Slaves
    │ [LocalOsc]│ │ [LocalOsc]│ │ [LocalOsc]│
    │  Locked   │ │  Locked   │ │  Locked   │
    └─────┬─────┘ └─────┬─────┘ └─────┬─────┘
          │             │             │
          ▼             ▼             ▼
     Frequency-Coordinated Operation

4.7.2 What Clock Synchronization Actually Enables

A defining advantage of the Fi-Wi architecture is the use of "Hard Synchronization" via PCIe, rather than "Soft Synchronization" via Ethernet. While Ethernet-based APs rely on IEEE 1588 PTP, they are subject to switch jitter and software stack latency. PCIe over fiber eliminates these variables.

Feature	Fi-Wi (PCIe over Fiber)	Traditional APs (Ethernet)
Protocol	PCIe PTM (Precision Time Measurement) Hardware-native, bus-level messages	IEEE 1588 PTP Packet-based, software/firmware stack
Sync Accuracy	20-50 nanoseconds Bus cycle precision + fiber margin	100ns – 10µs Highly dependent on network load
Jitter Source	Minimal Point-to-point hardware flow control	High Switch queuing & software interrupt latency
CPU Overhead	Zero Handled entirely by PCIe PHY/Controller	Moderate to High CPU must interrupt to process sync packets
Primary Benefits	Accurate L4S timestamps, TSF synchronization, unified timeline for clients	Basic time sync for logging and management

Important Note: While frequency-locked clocks provide excellent timing consistency, they do not enable RF phase control or coordinated simultaneous transmission. COTS Wi-Fi chips have independent RF synthesizers with arbitrary phase offsets that cannot be controlled externally. The value of clock synchronization lies in accurate timestamping for L4S queue management and consistent TSF counters for seamless client mobility, not in RF phase alignment.

4.7.3 Operating Modes: GPS-Disciplined vs. Free-Wheeling

The Concentrator's clock behavior depends on the deployment environment and regulatory requirements. There are two distinct modes of operation:

Mode A: GPS-Disciplined (Absolute Synchronization)

In this mode, the Concentrator is connected to an external GNSS (GPS/Galileo) receiver. The internal oscillator is disciplined to align with UTC (Coordinated Universal Time). This connects the internal timing of the Fi-Wi system to external absolute time.

Mode B: Free-Wheeling (Relative Synchronization)

In deep indoor environments (basements, bunkers) where GPS is unavailable, or cost-sensitive deployments where 6 GHz AFC is not required, the Concentrator operates in Free-Wheeling mode.

The Engineering Reality: Timing Consistency vs. Absolute Time
For dynamic RRH selection and coordinated scheduling, what matters is consistent timing across RRHs, not absolute UTC accuracy. As long as all RRHs maintain synchronized TSF counters relative to the Concentrator, the system can provide seamless mobility and accurate queue measurements—even if the system's concept of "UTC" is drifting by seconds per year relative to atomic time.

Because all RRHs are frequency-locked to the same Concentrator oscillator, if the Concentrator drifts, the entire system drifts in unison. This uniform time base enables coordinated operation without requiring external time references for basic functionality.

4.7.4 When Absolute Time Becomes Mandatory

While Free-Wheeling mode is sufficient for core system operation, GPS-Disciplined (Absolute) mode becomes mandatory when the Fi-Wi system interacts with external systems that require UTC timestamps:

6 GHz AFC (Automated Frequency Coordination): To operate at Standard Power in the 6 GHz band (essential for outdoor or large-venue coverage), the FCC requires the system to check a central database for incumbent microwave links. The database operates on UTC. The Concentrator must sign its request with a precise, absolute timestamp and geolocation. A drifting clock will cause the AFC request to be rejected, forcing the system into Low Power Indoor (LPI) mode.
Inter-Concentrator Handoffs (Multi-Building Roaming): In a campus environment with two distinct Concentrators (e.g., Building A and Building B), a client roaming between them may experience time jumps. If Concentrator A and B are free-wheeling independently, their timestamps may differ by seconds. This jump can break high-level security protocols (like Kerberos or 802.1X re-authentication) that reject "replay attacks" based on timestamp windows.
Correlated Debugging: If a user reports a connectivity drop at 10:04 AM, but the Concentrator has drifted by 45 seconds, the system logs will be stamped 10:04:45. Correlating Fi-Wi logs with client-side logs (which are usually synced to NTP/Cellular time) becomes operationally difficult, complicating root-cause analysis.

4.7.5 RRH Clock Distribution Hardware

Standard enterprise APs utilize free-running crystal oscillators with ~20 ppm frequency error. This causes TSF counters to drift relative to each other, making seamless mobility difficult. To achieve the timing consistency required for Fi-Wi's coordinated operation, the RRH hardware architecture must be fundamentally different.

The Fi-Wi Solution: The RRH hardware uses Mobile-Class Wi-Fi Silicon (which natively supports external clock inputs) driven by a Fronthaul-Recovered Precision Clock.

Diagram 4-3: RRH Precision Clock Distribution Chain

┌──────────────────────────────────────────────────────────────────────────────┐
│                        RRH CLOCK DISTRIBUTION ARCHITECTURE                   │
└──────────────────────────────────────────────────────────────────────────────┘

        [ PCIe Over Fiber ]
                 │
                 │ (1) PTM Timestamps (Implicit Clock)
                 ▼
   ┌─────────────────────────────┐
   │      RRH FPGA / Retimer     │
   │   (Clock Recovery Circuit)  │
   └─────────────┬───────────────┘
                 │
                 │ (2) "Dirty" Recovered Clock (High Jitter)
                 ▼
   ┌─────────────────────────────┐           ┌─────────────────────────────┐
   │    JITTER ATTENUATOR IC     │           │    WI-FI 7 SOC (Client)     │
   │    (e.g., Si5395 / LMK05)   │           │                             │
   │                             │           │                             │
   │   ┌─────────────────────┐   │           │    ┌───────────────────┐    │
   │   │  Digital Servo Loop │   │ (3) Clean │    │   Internal PLL    │    │
   │   │      (DSPLL)        │───┼───────────┼───►│ (RF Synthesizer)  │    │
   │   └─────────────────────┘   │ 40 MHz    │    └─────────┬─────────┘    │
   │                             │ Reference │              │              │
   └─────────────────────────────┘           └──────────────┼──────────────┘
                                                            │
                                                            ▼
                                                   [ 5 GHz / 6 GHz ]
                                                   [ RF Carrier    ]
                                                   (Independent phase per RRH)

Signal Flow: The RRH recovers a noisy clock from the PCIe fronthaul. A digital Jitter Attenuator cleans the signal using an internal DSP servo loop. This provides the ultra-low phase noise reference required for 4096-QAM while maintaining frequency lock to the Concentrator's timebase. Note: The Wi-Fi chip's internal PLL establishes its own RF carrier phase, which is independent across RRHs.

The clock distribution chain operates as follows:

Concentrator (Grandmaster): Distributes the master timebase via PTM packets over the PCIe-over-fiber link.
RRH FPGA / Retimer: Recovers the implicit clock from the PCIe bitstream or explicit PTM timestamps.
Network Synchronizer (Jitter Attenuator):
- Component: e.g., Silicon Labs Si5395 or TI LMK05318.
- Function: Feeds the "dirty" recovered clock digitally into this dedicated IC.
- Cleaning: The IC uses an internal, narrow-bandwidth DSP servo loop to filter out PCIe transport jitter, synthesizing a pristine 40 MHz reference.
Wi-Fi SoC (Client SKU): The cleaned signal is fed directly into the chip's Ext_Ref / XO_IN pin. The chip's internal PLLs lock to this external frequency reference, ensuring consistent TSF counter operation across all RRHs.

Architectural Decision: Digital Holdover vs. Voltage Control
Fi-Wi uses a Digital Network Synchronizer rather than a traditional VCTCXO servo loop. In a VCTCXO design, any noise on the analog control voltage line translates directly into phase noise, which degrades 4096-QAM EVM. By using digital jitter attenuation, the control loop remains in the digital domain until final synthesis, ensuring ultra-low phase noise while providing superior holdover stability if the fiber link flickers.

4.7.6 Why Mobile Wi-Fi SKUs?

Fi-Wi explicitly selects Mobile/Client Wi-Fi 7 chipsets (e.g., Qualcomm FastConnect or Broadcom BCM43xx client series) rather than traditional Enterprise AP SKUs. This choice is driven by specific architectural needs:

External Clock Support: Mobile chips are designed to share a single high-precision TCXO with a cellular modem and GPS. They expose dedicated Ext_Ref pins that accept an external drive signal, whereas many AP chips expect a passive crystal resonator. This external clock capability enables consistent TSF counter operation across all RRHs.
PCIe Native: These chips are designed as PCIe endpoints to communicate with host processors (Snapdragon or x86), fitting perfectly into the Fi-Wi DMA architecture.
Power & Thermal Profile: Optimized for battery-powered devices (< 3W), these chips fit the strict thermal envelope required for fanless, cell-per-room RRH enclosures.

4.7.7 What Clock Synchronization Does NOT Enable

It is important to understand the limitations of frequency-locked clocks with COTS Wi-Fi hardware:

No RF Phase Control: The Wi-Fi chip's internal RF synthesizer PLL establishes its own carrier phase. This phase is arbitrary and cannot be controlled or predicted from the external clock input.
No Coordinated Simultaneous Transmission: Without RF phase control, multiple RRHs transmitting on the same channel at the same time create random interference, not constructive combining. Each RRH must operate independently using CSMA/CA.
No Scheduled Transmission: In unlicensed spectrum, every transmission must perform Listen Before Talk (LBT) / CSMA/CA. You cannot command an RRH to "transmit at TSF timestamp X" as this would violate regulatory requirements.

Key Insight: The frequency-locked clock discipline ensures that TSF counters increment synchronously across all RRHs. This enables consistent timing for seamless mobility and accurate queue measurements—but does not enable RF phase control or coordinated simultaneous transmission. Those capabilities would require custom ASIC development with externally-controllable RF synthesizers, which is beyond the scope of COTS Wi-Fi chipsets.

5. Control Architecture: The Dual-Integrator System

A rigorous control-theoretic analysis of Wi-Fi reveals a fundamental challenge: there are not one, but two distinct integrators in the transmit path. In traditional autonomous APs, these integrators are coupled in undefined ways, leading to instability (bufferbloat) and poor interaction with TCP congestion control. Fi-Wi explicitly separates these integrators, applies distinct control laws to each, and enforces a strict Time-Scale Separation to guarantee system stability.

5.1 The Two Integrators

To achieve stability, we must model and control two distinct accumulation processes:

5.2 The Outer Loop: L4S and Group Queue Dynamics

The primary bottleneck managed by the AQM (Active Queue Management) is the Group Queue. This loop drives the end-to-end congestion control (L4S/TCP).

5.2.1 Queue Dynamics

The queue depth Q(t) evolves based on the mismatch between the arrival rate λ(t) and the effective service rate μ(t):

5.2.2 The PI² Control Law

Fi-Wi uses a PI² controller to calculate a marking probability $ p(t) $, targeting a shallow queue reference $ Q_{ref} $ (typically 200 µs). This provides a coherent signal to L4S senders:

5.3 The Inner Loop: MAC Aggregation and TXOPs

The Inner Loop manages the trade-off between PHY efficiency (large aggregates) and latency (small aggregates). In traditional APs, this integrator is effectively unbounded to maximize benchmark scores, creating a "sawtooth" latency pattern that confuses TCP.

5.4 System Integration: Time-Scale Separation

For the nested loops to remain stable, the Inner Loop must look like "constant service" to the Outer Loop. This requires the Inner Loop bandwidth (ω_mac) to be significantly higher than the Outer Loop bandwidth (ω_tcp):

5.4.1 Frequency Domain Constraint

By forcing the MAC to operate at a frequency of 3–5 kHz (via 250 µs TXOPs), the aggregation noise is pushed high enough that it is naturally filtered out by the TCP loop (which operates at 10–20 Hz).

5.4.2 A-MPDU Aggregation Coherence and ECN Marking Precision

The 250 µs TXOP constraint serves a dual purpose: it maintains time-scale separation and ensures L4S receives coherent ECN feedback. Traditional Wi-Fi's massive A-MPDU aggregation creates a fundamental mismatch between Layer 2 efficiency and Layer 3 control precision.

The Aggregation-Feedback Mismatch

In wide-channel deployments (160 MHz), APs build large A-MPDU aggregates containing dozens of IP packets to amortize MAC overhead. This creates three control-loop pathologies:

Fi-Wi's Coherence Strategy

This approach maintains the benefits of A-MPDU efficiency while preserving the feedback coherence L4S requires. The result: DualQ can sustain its ~1ms target drain time without artificial inflation from aggregate assembly delays. For detailed analysis, see Appendix I.7.

5.4.3 Design Parameters for Stability

6. Airtime Domains and Dynamic Queue Grouping

Loop	Parameter	Target Value	Rationale
Outer	Queue Reference	200 µs	Maintains ultra-low queuing delay.
Outer	Update Interval	5 ms (~1 RTT)	Matches typical control loop frequency.
Inner	Target TXOP	250 µs	Ensures ω_mac >> ω_tcp.
Inner	Max Aggregate	32 MSDUs	Limits tail latency contribution.

In Fi-Wi, the core rule is: there is one deep queue per independent airtime resource. The physical queue lives in concentrator memory, but it represents the airtime of one RRH or a dynamic group of RRHs whose RF signals are coupled strongly enough to behave like a single cell.

If two RRHs can interfere, they cannot transmit simultaneously and therefore must share a single logical queue. If RRHs are RF-isolated, each receives its own queue. This preserves the “one bottleneck per control loop” structure required by L4S.

6.1 Why airtime determines queue structure

Service at each queue corresponds to over-the-air transmission. Any RRHs that share RF space must share a service process and therefore share a queue. RRHs that do not interfere have independent airtime and get independent queues.

6.2 Forming airtime groups dynamically

Crucially, these RF groups and their queues are not static. The concentrator forms and maintains airtime domains dynamically using:

Beyond simple interference, Fi-Wi’s groupings also consider the spatial structure of the channels:

Groups may merge if interference appears or split if RRHs become effectively isolated (e.g., after a channel change or power adjustment, including beacon power shaping). The AQM and ECN marking logic always runs at the current group queue, so L4S always sees a single, well-defined bottleneck per cellularized domain.

Because all RRHs expose real-time CSI, queue metrics, retry statistics, airtime usage, and beacon reports into the concentrator’s shared state, Fi-Wi can form RF groups that are tuned not just for coverage but for:

6.3 Room-Level RRH Density (FTTR-Class Deployment)

Fi-Wi is not designed around a small number of big AP cells per floor. The architecture assumes something much closer to Fiber-to-the-Room (FTTR): one cell per room, with fiber or equivalent deterministic fronthaul feeding small RRHs in each room.

In higher-end deployments, each room can contain multiple RRHs (e.g., 2–4 per room) to support:

This density dramatically improves RF control. With RRHs separated by just a few meters, the concentrator sees:

Traditional AP-based architectures cannot achieve this cleanly because they lack shared state and maintain separate, isolated queues and PHY/MAC processes in each AP. Even with a central controller, they are limited to heuristic steering and static power/channel tweaks.

A cell-per-room architecture makes Fi-Wi fundamentally different from controller-based Wi-Fi: it behaves more like cellular small cells with centralized coordination than like a set of autonomous APs.

7. Queue Architecture for Fi-Wi

Fi-Wi centralizes packet memory, queueing, AQM, and TXOP scheduling inside the concentrator. Because the concentrator is the true bottleneck for all wireless transmissions, Fi-Wi can use a clean, minimal queue structure that behaves predictably under load and exposes stable delay semantics to L4S congestion controllers. This stands in contrast to traditional APs, where dozens of hidden queues (per-station, per-TID, firmware rings, retry/BA windows, PS-poll buffers, rate-control queues) produce variable and unobservable queueing delay.

This section describes Fi-Wi’s queue architecture, why WMM priority becomes largely unnecessary, and how centralized TXOP scheduling eliminates the stochastic contention that drives Wi-Fi collapse in legacy systems. The goal is simple: a minimal number of queues, explicit queue semantics, and predictable latency for all traffic classes.

7.1 Why queue architecture matters

Because all packets live in the concentrator’s memory until the moment they are transmitted over the air, Fi-Wi can explicitly control:

This allows Fi-Wi to do what distributed APs cannot: construct a consistent, visible bottleneck queue that L4S congestion controllers can lock onto with stable behavior.

7.2 The theoretical case: L4S makes most priority obsolete

If queue delay is capped around 500 µs, legacy WMM categories provide little additional value. For example, consider a voice stream:

If L4S keeps queueing delay under ~500 µs, then all traffic — including voice — stays far inside its latency budget. WMM’s role in combatting bufferbloat disappears when bufferbloat itself is removed.

7.3 Practical complications

• UDP does not respond to ECN

Fi-Wi can mitigate this using per-flow fair queuing inside the L4S queue, keeping UDP in check without needing a separate WMM hierarchy.

• Airtime vs. queue time

WMM historically manipulates AIFS, CW, and TXOP to reduce contention delay. Fi-Wi eliminates contention entirely using centralized TXOP scheduling, so WMM’s airtime hacks lose relevance.

• Failure modes and defense-in-depth

Hence, Fi-Wi benefits from a small amount of priority separation, at least in early deployments.

7.4 Minimal 3-queue structure

The theoretically sufficient minimal queue architecture for Fi-Wi is three queues:

In this design, WMM is unnecessary at the wireless bottleneck. All data traffic benefits from the same controlled queue delay, and fairness is enforced by per-flow scheduling rather than EDCA.

7.5 Pragmatic 5-queue structure

7.6 Numerical examples

Consider 10 simultaneous HD video calls (~20 Mbps total) plus a saturating background TCP flow:

This is roughly 1000× lower queueing latency than legacy WMM systems, and it applies to all traffic, not only traffic in a “priority” AC.

7.7 Deployment strategy

7.8 WMM support in Fi-Wi

Because of this, full WMM support at the air bottleneck is not necessary. However, Fi-Wi does support WMM semantics for:

This preserves compatibility while avoiding the complexity and unpredictability of EDCA-based priority systems. Over time, Fi-Wi deployments can rely on pure L4S semantics and collapse WMM to a compatibility shim, not a required scheduling mechanism.

7.9 Summary

Traditional Wi-Fi uses WMM to work around bufferbloat and contention. Fi-Wi removes those problems entirely through tight queue control, shared state, and central scheduling. Priority becomes a policy choice — not a crutch for an unstable MAC.

In Fi-Wi, the Carve-Out ensures the voice packet (L4S) bypasses the accumulated Classic bulk data completely. The file download continues to saturate the link, but the latency of the L4S flow is decoupled from the load of the Classic flow.

8. RRH-Level Active Redundancy

Fi-Wi’s centralized shared state across RRHs makes it natural to treat multiple radios as an active redundant set for the same STA or room. This is analogous in spirit to 802.11be’s Multi-Link Operation (MLO), where a single multi-link device (MLD) can use multiple links for reliability and capacity. In Fi-Wi, the concentrator is the coordination point leveraging shared state, and the RRHs are the distributed radios providing multiple RF paths.

8.1 Uplink: Duplicate Reception & Diversity

In many deployments, a client STA will be audible at more than one RRH (overlapping coverage). On the uplink, Fi-Wi exploits this spatial diversity to improve reliability without requiring changes to the client.

This approach leverages the spatial diversity of distributed RRHs to mitigate shadowing and multipath fading. Because the selection logic operates on valid MAC frames (after FCS verification) rather than raw I/Q samples, this architecture maintains compatibility with standard COTS Wi-Fi silicon at the Radio Head.

8.2 Downlink: per-packet steering

On the downlink, the concentrator can treat multiple RRHs as candidate transmitters for a given STA or room:

8.2.1 Listen-Before-Talk (LBT) and RRH Eligibility for Downlink Scheduling

In a multi-RRH Fi-Wi deployment, each radio head operates on the same BSSID and channel but sits in a different physical location with its own RF conditions. While Fi-Wi centralizes all queueing and scheduling decisions, every RRH must still obey the fundamental 802.11 rule: listen-before-talk (LBT).

This is where Fi-Wi diverges sharply from classical multi-AP systems. In UniFi, Ruckus, Aruba, and all controller-based Wi-Fi architectures, each AP queue is blind to the RF medium state until it attempts to transmit. The AP commits a packet to the hardware queue, and if the medium is busy, the packet waits (Head-of-Line blocking) while the AP performs backoff.

Fi-Wi inverts this. RRHs continuously report their LBT Eligibility Status (Clear/Busy) to the Concentrator via the high-speed telemetry path. RRHs report LBT eligibility status via PCIe telemetry with update intervals of 100–500 µs, well-matched to inter-TXOP scheduling decisions. While the Concentrator cannot react within a single 9µs backoff slot, it operates on the Inter-TXOP timescale (200–500 µs¹).

Before posting a new DMA descriptor to an RRH, the Scheduler checks this eligibility:

This prevents Head-of-Line Blocking where a packet sits in a hardware queue on a jammed radio. When multiple RRHs report clear airtime, Fi-Wi selects among them based on link quality (CSI) and predicted airtime efficiency. Conversely, if all RRHs report medium-busy, no RRH is primed; the scheduler pauses the flow to prevent backpressure from accumulating in the RRH hardware, keeping the queue depth visible in the Concentrator where L4S can measure it.

The result is a form of Centralized Selection based on LBT Eligibility. Multi-AP systems coordinate configuration (channels, power), but they cannot coordinate transmit starts because they lack the real-time feedback loop to steer packets away from busy radios before they are queued.

¹ Representative scheduling interval for mixed traffic workloads; actual TXOP durations range from tens of microseconds (small frames) to several milliseconds (large aggregates). ↩

8.3 Analogy to Wi-Fi 7 MLO

802.11be MLO allows a multi-link device (AP/STA) to use multiple links (e.g., 2.4G, 5G, 6G bands or channels) under a single MAC entity. Features include:

Fi-Wi provides a similar effect at the building scale, but with important differences:

Because the RRHs are spatially distributed around rooms and hallways, Fi-Wi gains advantages that co-located antennas cannot provide:

These advantages come from intelligent packet routing and dynamic RRH selection, not from RF phase coordination or simultaneous beamforming across RRHs.

8.3.1 Fi-Wi vs Wi-Fi 7 MLO: Compliance and Control

Fi-Wi strictly adheres to local regulatory compliance. The Concentrator manages the queue and the schedule, but the RRH manages the compliance.

When the Scheduler assigns a TXOP to an RRH, it posts a descriptor. The RRH hardware then performs standard 802.11 EDCA:

In MLO or Mesh: If an AP commits a packet to a radio and that radio hits congestion, the packet is trapped in the local buffer. The backoff might take 50ms. During this time, the AP's other radios (or other APs in the mesh) might be idle, but they cannot help because the packet is already "owned" by the busy MAC.

In Fi-Wi: The packet remains in the Concentrator's central memory until the last possible moment (see Appendix F). If the Concentrator sees an RRH entering deep backoff (via real-time telemetry) or reporting "Busy," it stops posting new descriptors to that RRH and steers subsequent traffic to a free RRH. The backoff engine remains local (compliance), but the queue feeding it is steered globally (performance).

This allows Fi-Wi to scale airtime domains across an entire building while preventing the multi-node contention collapse that plagues traditional Wi-Fi networks.

8.4 Preserving the "single bottleneck" L4S view

To keep L4S happy, Fi-Wi needs to preserve a single bottleneck queue per flow even while using multiple RRHs:

9. Dynamic Point Selection and Intelligent Frequency Reuse

Traditional Wi-Fi deployments suffer from two fundamental problems in high-density environments: (1) clients are statically associated to a single AP based on initial connection, leading to suboptimal performance as they move, and (2) autonomous APs compete for airtime through CSMA/CA contention, creating interference. Fi-Wi inverts this paradigm through Dynamic Point Selection—continuously choosing the optimal RRH per packet—and Intelligent Frequency Reuse—leveraging spatial isolation to maximize capacity.

9.1 Dynamic Point Selection: The Core Capability

Unlike traditional Wi-Fi where clients are physically and logically tied to a single Access Point (AP), Fi-Wi treats the entire building as a single Virtual Cell. The Concentrator maintains real-time Channel State Information (CSI) from all RRHs and dynamically selects the optimal transmission point for each individual packet.

9.1.1 The Roaming Paradigm Shift: Negotiation vs. Execution

To understand the magnitude of this shift, we must compare the standard "Fast BSS Transition" (802.11r) with the Fi-Wi approach. In standard Wi-Fi, mobility is a negotiation. In Fi-Wi, it is an execution.

While 802.11r is sufficient for buffered video (Netflix), it typically breaks real-time applications like Voice over Wi-Fi (VoWiFi) and VR/XR, where a 50ms gap causes audio dropouts or visual artifacts. Fi-Wi's sub-millisecond switching ensures true continuity.

9.1.2 How It Works

9.1.3 Example Scenario

9.3 Intelligent Frequency Reuse

Step	Standard Wi-Fi (802.11r / Fast Roaming)	Fi-Wi (Dynamic Point Selection)
1. Trigger	Client detects low RSSI and decides to scan.	Concentrator detects better path via Uplink SNR.
2. Action	Client tunes radio off-channel to scan for beacons (Latency spike: 50–100ms).	Zero Action. Client stays on channel.
3. Handshake	Client sends Auth + Re-Assoc frames. AP validates keys.	None. No Over-the-Air frames.
4. Switch	AP 1 tears down keys; AP 2 installs keys.	Concentrator updates the DL_RRH_ID pointer in memory.
Total Time	~50ms – 150ms (Best case)	< 1ms (PCIe Write)

In traditional Wi-Fi, neighboring APs on the same channel create co-channel interference. The standard solution is to assign different channels (e.g., AP-A uses Channel 36, AP-B uses Channel 48), but this wastes spectrum. Fi-Wi enables intelligent frequency reuse—using the same channel across multiple RRHs when spatial conditions allow.

When Frequency Reuse Works

Frequency reuse is viable when clients are in spatially separated locations with significant isolation (typically >25-30 dB attenuation due to walls, floors, or distance).

Dynamic Adaptation

Why Autonomous APs Cannot Do This

9.4 Transparent Integration with L4S

The complexity of dynamic point selection and frequency reuse is hidden from the L4S congestion control loop. Traffic still lives in per-airtime-domain group queues. When the Concentrator enables frequency reuse or optimizes RRH selection, it simply affects the effective service rate μ(t) of the queue.

Requirement	Fi-Wi (C-RAN)	Autonomous APs
Global CSI Visibility	Complete: Concentrator sees CSI from all RRHs to all clients in real-time	Fragmented: Each AP only knows its own channel. Must exchange info over backhaul (high latency)
Decision Latency	Microseconds: Concentrator makes decisions in software at µs granularity	Milliseconds to seconds: APs coordinate via slow management protocols
Adaptation Speed	Per-packet: Can switch RRH or channel based on every CSI update	Minutes: Channel changes require beacon updates, client reassociation
Client Disruption	None: Decisions are transparent to clients	High: Channel changes or AP reassignment cause connectivity interruptions

The PI² controller in the outer loop (see Section 5) sees the queue draining faster and naturally reduces ECN marking. This allows L4S senders (TCP Prague) to ramp up their congestion windows to fill the expanded capacity. The system automatically discovers and exploits available spatial capacity without requiring changes to congestion control algorithms or application awareness.

9.5 Governing Station Media Access: The Control Hierarchy

A common critique of centralized wireless architectures is the "autonomous client problem": while the infrastructure can be coordinated, the stations (STAs) are independent entities that contend for the medium using their own logic.

Fi-Wi addresses this by enforcing a Control Hierarchy that governs client behavior from the physical layer up to the transport layer. Instead of passively hoping for "good client behavior," Fi-Wi uses four distinct mechanisms to throttle, steer, or schedule station media access.

Figure 9-3: The Four Tiers of Client Governance

Level 1: Deterministic (Hard)
   [ 802.11ax Trigger Frames ] ──▶ STA must wait for Schedule
                                    (Zero contention)

Level 2: Transport (Adaptive)
   [ L4S / ECN Marking ] ────────▶ OS Kernel throttles pacing
                                    (Reduces MAC load before enqueue)

Level 3: RF Physics (Steering)
   [ Beacon Power Shaping ] ─────▶ STA firmware seeks new cell
                                    (Moves demand to different domain)

Level 4: Statistical (Soft)
   [ WMM / AIFS Parameters ] ────▶ STA adjusts backoff aggression
                                    (Statistical deprioritization)

1. Deterministic Scheduling (802.11ax/be)

For modern clients (Wi-Fi 6/7), Fi-Wi removes autonomy entirely for uplink traffic. The Concentrator generates Trigger Frames via the RRH.

Mechanism: The Trigger Frame explicitly allocates Resource Units (RU) and time slots to specific clients.
Effect: The STA is forbidden from using EDCA to contend for the medium. It effectively becomes a "slave" to the Concentrator's schedule, converting uplink traffic from a stochastic probability distribution into a deterministic timetable.

2. Transport-Layer Pacing (L4S)

For the growing ecosystem of L4S-capable clients (iOS, macOS, Linux, Windows), control is applied at the Operating System kernel.

Mechanism: The Concentrator marks the CE (Congestion Experienced) codepoint in the IP header of downlink packets based on the centralized Group Queue depth.
Effect: The client's TCP stack (e.g., TCP Prague) detects the mark and immediately reduces its packet pacing rate. This throttles media access upstream of the Wi-Fi chip, preventing the client from flooding its local hardware queues and causing self-inflicted collision storms.

3. RF Footprint Shaping (Beacon Power)

Fi-Wi manipulates the physical environment to restrict which RRHs a client perceives as viable, effectively "shoving" media access demand to specific airtime domains.

Mechanism: The Concentrator dynamically adjusts the beacon transmit power of individual RRHs.
Effect: By shrinking the effective cell size of a loaded RRH, the system forces the client's firmware to roam to an alternative RRH. This is a "Physics-Layer Steering" mechanism that works on all clients, regardless of protocol version.

4. Statistical Parameter Biasing (WMM/AIFS)

As a defense-in-depth measure for legacy clients, Fi-Wi advertises tuned WMM EDCA parameters.

Mechanism: Dynamic adjustment of AIFS (Arbitration Inter-Frame Space) and CWmin/CWmax values in the Beacon.
Effect: Increasing AIFS for background traffic classes forces aggressive legacy clients to wait longer intervals between sensing idle air and transmitting. While not deterministic, this statistically biases the medium access probability in favor of managed traffic.

Summary: Fi-Wi does not rely on a single method to control clients. It uses Triggers for precision, L4S for flow-rate discipline, RF Shaping for load balancing, and WMM as a statistical safety net.

9.6 What Dynamic Point Selection Does NOT Enable

To maintain technical accuracy, it is important to clarify what Fi-Wi's dynamic point selection does not provide:

Fi-Wi's architecture deliberately focuses on capabilities achievable with COTS Wi-Fi chips, providing 2-3x capacity improvement through intelligent management rather than pursuing 4-6x gains that would require custom silicon development.

9.7 Performance Expectations

Based on the capabilities described above, Fi-Wi provides the following performance improvements over traditional autonomous AP deployments:

These gains are achieved through centralized intelligence and microsecond-latency fronthaul, not through RF phase control or coordinated transmission. The architecture remains fully compliant with unlicensed spectrum regulations and works with commodity Wi-Fi chipsets.

9.8 Summary

Fi-Wi transforms the problem of wireless density by treating it as a routing and scheduling problem rather than an RF coordination problem. By centralizing packet memory and MAC scheduling, Fi-Wi converts adjacent radios from interferers into dynamically selected access points, allowing the network to scale capacity through intelligent management rather than collapsing under interference.

The key insight is that most Wi-Fi performance problems stem from poor decisions (wrong AP, wrong channel, wrong timing) rather than fundamental RF limitations. Fi-Wi solves this by providing the Concentrator with complete visibility and control, enabling microsecond-granularity optimization that autonomous APs cannot match.

10. Fi-Wi value vs. Traditional Distributed APs

Modern enterprise Wi-Fi deployments use centralized controllers (Cisco WLC, Aruba Mobility Controller, Ubiquiti UniFi, Ruckus SmartZone, etc.) to manage multiple APs. These controllers coordinate the control plane: channel assignment, transmit power, client association hints, roaming policies, and security. However, these remain loosely-coupled systems where the data plane — queueing, MAC scheduling, aggregation, and packet memory — remains distributed inside individual APs.

A traditional AP is not just “running EDCA.” It is running EDCA after juggling dozens or hundreds of logical MAC queues and state machines:

With N stations, an AP can easily have on the order of N × (4–8) logical queues behind a single RF channel. Every AP in the same RF domain runs this large, isolated, queue-filled state machine independently. No AP has a global view; controllers see only coarse statistics.

Fi-Wi is fundamentally different: it centralizes both control plane and data plane with shared state across all RRHs. The concentrator does not just configure RRHs; it directly manages their queues, schedules their TXOPs, maintains unified CSI and airtime state, and applies coordinated ECN marking for each airtime domain. This architectural difference — not just improved control-plane coordination — is what enables Fi-Wi’s latency, L4S, and spatial multiplexing advantages.

The following subsections detail specific benefits of Fi-Wi’s cellularized, tightly-coupled architecture compared to controller-managed, loosely-coupled AP systems.

10.1 Deterministic low latency

Each AP builds its own local queues. Under load, large aggregates, retries, and hidden buffering produce multi-millisecond queueing and service delays. Tail latency is largely uncontrolled, and varies across APs sharing the same channel.

10.2 Stable L4S behavior

L4S flows traverse multiple hidden queues: wired bottlenecks, AP-local queues, firmware queues, and EDCA contention. ECN marking (if it exists at all) is inconsistent and not tied to a single bottleneck. Collapse produces noisy, bursty marking or loss, and the L4S control loop becomes oscillatory or falls back toward classic congestion behavior, especially in the tails that matter to users.

10.3 Aggregation without losing visibility

Aggregation improves PHY efficiency but hides individual packet timing from the congestion controller. The controller does not know which MSDUs were grouped into a TXOP, what the queue state was when the TXOP started, or how long each device has been waiting.

This combination yields high PHY efficiency and transport-layer visibility into congestion, instead of having to choose one or the other.

10.4 Building-scale coordination

The controller can adjust channels, power, and send steering hints (e.g., 802.11v), but it cannot see or control:

As a result, these systems rely on heuristic, reactive policies: channel reassignment after interference is observed, power adjustments based on neighbor reports, and client steering using RSSI or airtime snapshots. These help, but they operate on coarse time scales (seconds to minutes) and cannot fix the fundamental data-plane issues of distributed queues, MAC contention, and tail latency under load.

The concentrator maintains true shared state across all RRHs in the building:

Because RRHs are distributed in space (often 2–4 per room in high-density deployments), Fi-Wi can leverage spatial separation for intelligent frequency reuse. The concentrator sees CSI from all RRHs and can make microsecond-granularity decisions about which RRH should transmit each packet — all while preserving the "single bottleneck queue per airtime domain" discipline required for stable L4S behavior.

10.5 Control Plane vs. Data Plane

The table below summarizes the architectural differences between controller-managed, loosely-coupled APs and Fi-Wi's cellularized, tightly-coupled architecture:

10.6 Operational and lifecycle advantages

Capability	Controller-Managed Loosely-Coupled APs	Fi-Wi Cellularized Tightly-Coupled
Control Plane
Channel assignment	✓ Centralized	✓ Centralized
Transmit power control	✓ Centralized	✓ Centralized + dynamic beacon shaping
Client steering hints	✓ Centralized (802.11v/k)	✓ Centralized
Data Plane
Packet queues	✗ Distributed per-AP; many hidden per-STA/per-TID/firmware queues	✓ Exactly one deep queue per airtime domain in the concentrator
MAC scheduling & aggregation	✗ Autonomous per-AP; long TXOPs under load	✓ Coordinated across RRH groups; TXOP length explicitly bounded
Timestamp synchronization	✗ Not available at packet level	✓ µs-accurate (PTM/PTP) shared across RRHs
Shared CSI state	✗ Per-AP only; summarized to controller	✓ Building-wide CSI aggregation at the concentrator
Queue visibility & AQM	✗ Hidden in each AP; no global AQM	✓ Fully visible per domain; explicit L4S/AQM on the true bottleneck
L4S/ECN marking point	✗ Inconsistent or absent; multiple uncontrolled bottlenecks	✓ Single, well-defined marking point per airtime domain
Dynamic point selection	✗ Clients statically associated to one AP	✓ Per-packet RRH selection based on real-time CSI (Section 9)
Selection diversity	✗ Single AP receives uplink	✓ Multiple RRHs receive; best copy selected (Section 9)
Intelligent frequency reuse	✗ Static channel plan	✓ Dynamic adaptation based on spatial isolation (Section 9)
Per-packet steering between radios	✗ Not available	✓ Active redundancy and fast failover (Section 8)
Dynamic RF grouping	✗ Static AP boundaries	✓ Adaptive airtime domains based on CSI and load (Section 6)

11. RRH Physical Envelope: Power, Thermals, and Size

The economic viability of a "Cell-Per-Room" architecture hinges on the Remote Radio Head (RRH) being fundamentally simpler, cooler, and cheaper than a traditional Enterprise Access Point. By offloading complex logic to the Concentrator (Section 13) and precision timing to the Fronthaul (Section 4.7), the RRH becomes a lean physical device.

11.1 The Silicon Strategy: Mobile vs. Enterprise SKUs

Fi-Wi explicitly selects Mobile/Client Wi-Fi 7 chipsets (e.g., Qualcomm FastConnect or Broadcom BCM43xx client series) rather than traditional Enterprise AP/Networking SKUs. While Section 4.7 detailed how this enables external clocking, this choice is equally critical for the physical envelope:

Power Efficiency: Optimized for battery-powered smartphones, these chips operate within a strict thermal envelope (< 3W peak), compared to 10W+ for Enterprise AP SoCs.
Integration: Mobile SKUs often integrate the Baseband, MAC, and RF Front End (FEM) logic tightly, reducing board footprint.
Cost Structure: Driven by smartphone volumes (billions of units), mobile SKUs offer a significantly lower price point than low-volume enterprise silicon.

11.2 Power Budget Composition

We set a hard budget of 3.5–4 W total per RRH, enabling Power over Ethernet (PoE) Class 1 or 2 operation, or simple remote powering over hybrid fiber/copper cables.

Wi-Fi RTL + LNAs (≤ 3.0 W):
LNA and baseband signal processing dominate the power. Because the complex MAC scheduling logic lives in the Concentrator, the RRH silicon remains in a low-power state compared to an autonomous AP CPU which burns power on interrupts and queue management.
PCIe Retimer + Optics (~0.8 W):
The PCIe Gen3/4 retimer and optical driver consume a fraction of the power of a standard 2.5GbE PHY/Switch port found in legacy APs.
Housekeeping (~0.2 W):
Jitter attenuator (Section 4.7.3), voltage regulation, and environmental sensors.

11.3 Thermal and Mechanical Implications

A sub-4W envelope fundamentally changes the industrial design possibilities for the RRH:

Fanless Design: Passive cooling is sufficient via a modest metal backplate or internal heatsink. There are no moving parts to fail.
Flush-Mount Enclosures: The low thermal density allows the RRH to be mounted flush inside wall boxes or ceiling tiles without requiring large air gaps or vented "ufo-style" enclosures.
MTBF (Reliability): The elimination of fans, high-heat CPU cores, and electrolytic capacitors (often needed for higher power rails) significantly increases the Mean Time Between Failures.

11.4 Concentrator-Side Considerations

Fi-Wi relies on a "Split Thermal" architecture. We deliberately shift the power density from the edge (the ceiling) to the core (the wiring closet).

The Core: The Concentrator handles the computation-heavy tasks: shared-state maintenance, CSI learning (Appendix B), and L4S AQM. This generates heat, but it lives in a rack where forced-air cooling and noise are acceptable.
The Edge: The RRH handles only RF and signaling. By moving the "brain" to the rack, we ensure the visible hardware in the room remains cool, silent, and small—a prerequisite for high-density residential and hospitality deployments.

12. PCIe Fronthaul (Gen3 x1 over Fiber)

12.1 Why PCIe as the RRH interface

A central hardware design choice is to make the RRH look like a PCIe endpoint to the Fi-Wi concentrator. This leverages the fact that:

We start with PCIe Gen3, one lane (x1), carried over fiber via a retimer + optical interface. Higher generations or widths (Gen4, x2/x4) are possible later but not required for the initial Fi-Wi performance targets.

12.2 Gen3 x1 throughput

After protocol overhead (TLP headers, DLLPs, flow control), the sustained payload throughput for Gen3 x1 is in the rough range of 6–7 Gb/s for large transfers. This is more than sufficient for:

For our initial Fi-Wi deployment assumptions, Gen3 x1 over fiber is a sensible and sufficient starting point.

12.3 Latency characteristics and budget

the PCIe-over-fiber latency is effectively negligible. It comfortably fits within the microsecond-level time base used for:

12.4 Mapping queues and metadata

The PCIe model fits naturally with the Fi-Wi queueing and metadata scheme. Each RRH behaves like a PCIe endpoint with:

The FiWiMeta header lives in host memory adjacent to packet payloads and is referenced by these descriptors.

12.5 PCIe Hot Swap

A critical operational requirement for Fi-Wi is the ability to service, replace, or add RRHs without bringing down the entire building's wireless network. PCIe provides native support for this through hot-plug capability, which is standard in enterprise server platforms and can be leveraged for Fi-Wi deployments.

12.5.1 Hot-plug fundamentals

PCIe hot-plug allows physical insertion and removal of endpoint devices (RRHs) while the system is running:

12.5.2 RRH insertion flow

Time from physical insertion to active traffic forwarding: typically 1–5 seconds, depending on link training, driver initialization, and RF group discovery.

12.5.3 RRH removal flow

When an RRH is removed (planned maintenance, failure, or surprise disconnection):

Impact on active connections: minimal to none for STAs served by multi-RRH domains. Traffic seamlessly fails over to remaining RRHs within the same RF group. For isolated single-RRH cells, removal causes brief disconnection until STAs reassociate with neighboring cells.

12.5.4 Operational advantages

12.5.5 Design considerations

12.5.6 Contrast with traditional APs

Fi-Wi's PCIe hot-plug, combined with multi-RRH airtime domains and centralized queues, enables sub-second failover with minimal packet loss—a qualitative improvement over traditional Wi-Fi high-availability approaches.

12.5.7 Integration with L4S and queue management

This separation—queues and control in the concentrator, timing-critical MAC in hot-swappable RRHs—is precisely what enables graceful hardware lifecycle management while maintaining the control-theoretic cleanliness that L4S requires (Appendix A).

13. Hardware Architecture: The Workstation Concentrator vs. The Legacy AP

To understand why Fi-Wi achieves deterministic latency where traditional Wi-Fi fails, we must look beyond the protocol and into the physical architecture of the devices. The feasibility of the "Cut-Through" RRH design relies on the upstream link being non-blocking. Fi-Wi achieves this by replacing the internal switching fabric of legacy APs with the massive PCIe lane overprovisioning of a workstation-class Concentrator.

13.1 The Legacy Bottleneck: Anatomy of a Traditional AP

A traditional Enterprise Access Point is functionally a "Router-on-a-Stick." It forces high-speed wireless traffic through a series of internal serialization bottlenecks before the software ever sees the packet.

13.2 The Fi-Wi Solution: The 92-Lane Fabric

Component	Traditional AP (The Appliance)	Fi-Wi RRH (The Peripheral)
Core Silicon	Complex SoC (Quad-core CPU, NPU, Switch)	Thin PHY/MAC + PCIe Retimer
Data Path	Store-and-Forward (Switch → CPU → DMA)	Cut-Through (Fiber → PCIe → Air)
Queues	1000s of opaque hardware queues	Zero deep queues (FIFO only)
Decision Making	Autonomous (Local Scheduler)	None (Slave to Concentrator)

Fi-Wi eliminates the internal switch, the GMII link, and the autonomous CPU. By utilizing high-end workstation silicon (e.g., AMD Threadripper Pro or Intel Xeon W-3400 series), the Concentrator provides 92 to 128 native PCIe lanes directly from a CPU with 24 to 96 high-performance cores.

The 92+ lanes of PCIe eliminate the need for an internal ethernet switch anywhere in the datapath.

13.3 Dedicated Resources and Determinism

By mapping each RRH (or small groups of RRHs) to dedicated root ports on the CPU, Fi-Wi achieves a Non-Blocking Architecture:

This guarantees that the host DRAM behaves like Deterministic Ultra-Low Latency Memory rather than a shared network resource. This stability is the physical foundation that allows the software-defined queues (Section 14) to operate with microsecond precision.

14. Hardware Queues and the Software Advantage

14.1 The Hardware Queue Problem

Traditional Wi-Fi APs use hardware DMA (Direct Memory Access) rings to meet strict 802.11 MAC timing requirements—SIFS and DIFS deadlines measured in microseconds. While this solves the timing problem, it creates a cascade of architectural constraints that Fi-Wi explicitly avoids.

Hardware queues are expensive to implement in silicon. Each queue requires dedicated SRAM for descriptor storage, control logic for pointer management and overflow handling, and power even when idle. Current chip design limits traditional APs to hardware queues at L2 or MAC—typically the four WMM access categories (AC_VO, AC_VI, AC_BE, AC_BK) per radio * N stations.

While sufficient for basic priority handling, this fundamental constraint prevents the sophisticated per-flow scheduling that modern high-density networks require:

14.2 The DMA Ownership Constraint

An equally significant problem is that once packets are enqueued to hardware DMA rings, the CPU cannot access them without causing race conditions. This "ownership transfer" creates fundamental limitations:

14.3 Compensating Hardware

Because hardware queues are limited and packets become inaccessible after DMA, traditional AP vendors must add compensating hardware functionality to address these fundamental architectural limitations:

This compensating hardware represents substantial additional silicon area, design complexity, and verification effort. More critically, hardware-based solutions are fundamentally limited to fixed thresholds and simple policies that were designed into the chip. They cannot implement sophisticated algorithms like CoDel, PIE, or adaptive per-flow policies that require complex state and frequent updates.

14.4 Fi-Wi's Architectural Solution

RRH: Timing without queuing

RRH silicon implements only timing-critical functions (MAC/PHY, synchronization) with zero hardware queues. Packets arrive from the concentrator milliseconds before transmission, stay in simple descriptor rings briefly, then transmit. No autonomous queuing or scheduling logic.

Concentrator: Unlimited software queues

All queues live in concentrator DRAM. Because the concentrator operates at TXOP granularity (~600 µs) rather than SIFS granularity (16 µs), it has time for software scheduling. Queue structures are simple data structures in memory— vastly cheaper than dedicated silicon:

Packet ownership until last moment

The critical difference: packets remain in concentrator DRAM (software-accessible) until milliseconds before transmission. The scheduler can:

RRH only owns packets for ~1 ms while transmitting a TXOP—too brief to constrain the system.

14.5 Economic and Strategic Impact

Beyond the direct silicon cost advantages, Fi-Wi gains strategic advantages that compound over time:

14.6 Architectural Principle

This separation is not arbitrary. It's driven by fundamental constraints: hardware is expensive, inflexible, and opaque; software is cheap, updatable, and inspectable. By placing intelligence in software and only timing-critical functions in hardware, Fi-Wi achieves both the performance of hardware-accelerated systems and the flexibility of software-defined networking—advantages that traditional distributed-AP architectures cannot replicate due to their need for autonomous per-AP decision-making at microsecond timescales.

15. Adaptive Control via Machine Learning

The Fi-Wi architecture's centralized observability enables machine learning to optimize MCS transition dynamics on a per-site basis. Unlike autonomous APs that operate on partial, local state, the Concentrator observes the complete state-transition graph for all RRHs under a single clock. This section describes how Fi-Wi combines physics-based models with adaptive learning to optimize performance.

15.1 The MCS State Graph as a Probability Current Network

The MCS state graph from Section 2.7 can be formalized as a probability current network, where each node represents a PHY configuration state (MCS index, spatial stream count) and edges represent transitions between states. The system's behavior follows probability flow dynamics:

15.2 What Gets Learned: The Transition Rate Matrix

Fundamental Limitation	Hardware Workaround Required	Complexity Added
Only 4-8 queues → no per-flow fairness	Airtime fairness tracking engine	Significant additional logic
Only 4-8 queues → no per-STA queuing	MU-MIMO grouping and coordination	Complex scheduling algorithms
Can't inspect after enqueue	Hardware deep packet inspection engine	Pattern matching, state tracking
Can't mark ECN in real-time	Hardware ECN marker with threshold logic	Queue monitoring, marking logic
Can't reclassify flows dynamically	Flow classification accelerator (TCAM)	Fixed rules; high-priority only; cannot update easily

Aspect	Traditional AP	Fi-Wi
Queue count	N stations * 4-8 (at MAC or L2 level)	1000+ (dynamically allocated, quintuple level)
Queue implementation	Dedicated silicon (expensive)	Software data structures (negligible cost)
Compensating logic	Substantial silicon for workarounds	None needed
Per-flow fairness	Impossible (insufficient queues)	Standard capability
Sophisticated AQM	Simple thresholds only (hardware fixed)	Any algorithm (CoDel, PIE, ML-based)
Policy updates	Requires new silicon design	Software configuration or code update
Operational visibility	Aggregate counters only	Full per-flow statistics and queue contents
Algorithm experimentation	Impossible in production	A/B testing, gradual rollout possible

Machine learning in Fi-Wi optimizes the transition rate matrix W based on telemetry that is only observable in a centralized architecture. For each potential transition from state i (MCS_i, SS_i) to state j (MCS_j, SS_j), the learned rate depends on:

This learned function answers: "Given the current state and observed conditions, what is the optimal next MCS/SS configuration to meet the L4S latency target while maximizing achievable throughput?"

15.3 Physics-Informed Learning

Fi-Wi uses physics-informed machine learning that combines Shannon capacity theory with learned corrections. This hybrid approach provides explainability, sample efficiency, and principled generalization.

W_physics: The physics baseline uses Shannon capacity to establish theoretical bounds. For each MCS index, the required SNR is known from 802.11 specifications (e.g., MCS 11 requires ~30 dB). The base transition rate is the probability that current SNR exceeds the threshold given measured CSI.

W_learned: The learned correction factor captures deviations from ideal conditions on a per-station basis, as different spatial stream capabilities and local RF environments require station-specific adaptation:

This approach uses residual learning: the physics model W_physics provides the coarse steering (the "prior"), while the ML model learns the residual error Δ specific to the site. This guarantees the system never performs worse than a standard physics-based model, even before site-specific training converges. The ML correction is additive (or multiplicative) to a known-good baseline.

15.4 Training Data from Centralized Observability

The Concentrator's complete state visibility provides labeled training examples that are impossible to obtain in distributed AP systems. Each scheduling decision creates a training tuple:

Over time, the Concentrator accumulates thousands of these labeled examples across varying conditions. The ML model learns patterns such as:

This supervised learning is only possible with centralized observability. As detailed in Appendix H, autonomous APs lack:

It's worth noting that supervised learning doesn't require perfect ground truth labels to be effective—even relative quality assessments ("better" vs "worse") can drive learning. However, Fi-Wi's complete observability provides significantly richer training signals: precise measurements of queue impact, throughput changes, and latency effects that enable more efficient learning compared to the partial observability available to autonomous systems.

15.5 Transfer Learning Across Sites

Fi-Wi's ML strategy uses transfer learning to balance generalization across sites with site-specific optimization:

A foundational model is trained across multiple deployment sites to learn universal patterns:

When deployed to a new site, the base model is augmented with learned corrections:

15.6 The Learning Feedback Loop

Fi-Wi's ML capability creates a feedback loop that improves system performance over time:

This loop is unique to centralized architectures. Autonomous APs cannot generate ground truth labels without queue observability. Coordinated AP systems (where APs share summaries via a controller) see effects (latency, ECN) but not causes (queue growth, retry timing, aggregation depth) due to high inference distance.

Fi-Wi's centralized state graph provides the causal observability that machine learning requires. The probability current framework gives this learning a rigorous mathematical foundation: we are learning the transition rate matrix of a physical system governed by conservation laws.

15.7 The Multi-RRH Advantage: Learning the Spatial Network

The presence of multiple concurrent Radio Heads (RRHs) serves as the primary multiplier for the Fi-Wi machine learning capability. It transforms the learning problem from optimizing a single isolated link into optimizing a spatially coupled network. While a traditional AP optimizes a local objective function (its own throughput), the Fi-Wi Concentrator utilizes concurrent RRHs to construct a global view of the RF environment.

1. Global RF State Visibility ("The Super-Eye")

In traditional systems, an AP is blind to the interference seen by its neighbors. In Fi-Wi, the Concentrator aggregates real-time telemetry from all RRHs simultaneously.

This state matrix is sparse, time-aliased, and derived from standards-compliant telemetry rather than continuous per-packet baseband capture.

The model learns not just that "Client A has a weak signal," but specifically that "Client A is weak on RRH 1, strong on RRH 2, and creates -80 dBm interference on RRH 3." This global observability enables the prediction of building-wide interference patterns invisible to single-cell learners.

2. Expanded Action Space (Selection & Redundancy)

Because Fi-Wi treats multiple RRHs as an active redundant set, the ML engine has a broader action space than a standard rate-control algorithm. It learns not only how to transmit (MCS and scheduling decisions) but which RRHs are eligible transmitters for a given packet.

3. Phase 2 Capability: Eigenstructure & Rank Expansion

Note: This capability requires the hardware-synchronized FPGA architecture (Phase 2).

With sub-nanosecond synchronization, the ML engine will be able to resolve the true distributed Eigenstructure of the environment—the "shape" of available RF paths across distributed radios. This allows for Rank Expansion, where the system resolves more spatial streams (Eigenvectors) than a single physical AP could support, scaling capacity approximately with the number of RRHs, subject to channel rank and geometry.

15.8 Operational Calibration: Zero-Occupancy Sounding

To ensure the physics-informed model converges accurately, Fi-Wi employs a specific operational strategy: Zero-Occupancy Sounding.

As described in Section 15.5, the site-specific transfer function is composed of static building characteristics (H_static) and dynamic temporal variations (Δ_temporal). To disentangle these variables, the system schedules automated channel sounding during hours of minimum occupancy.

This establishes a stable baseline "Zero State" for the learning model, ensuring that subsequent online learning is optimizing for dynamic changes rather than relearning the static environment. This separation dramatically improves offline RL dataset conditioning by preventing the model from relearning static structure while adapting to temporal dynamics.

15.9 Bounded Model Validation During Idle Periods

While the primary learning mode is offline (using historical data), the centralized Concentrator architecture enables a hybrid approach: opportunistic, bounded model validation during predicted idle periods.

Idle Period Detection

Because the Concentrator has global visibility of queue states across all RRHs in an Airtime Domain, it can predict when the RF channel will be underutilized—a capability fundamentally unavailable to autonomous APs that see only their local queues.

Safe Validation Protocol

During high-confidence idle predictions, the system can perform controlled validation and calibration—not arbitrary exploration:

These activities refine the offline model without introducing risk to production traffic.

Production Traffic Protection

This hybrid approach provides the safety of offline learning with the adaptability of continuous refinement, exploiting natural traffic lulls that autonomous APs cannot collectively identify.

15.10 Architectural Comparison: Why Autonomous APs Cannot Learn

Machine learning for MCS optimization is fundamentally enabled by Fi-Wi's centralized architecture and impossible in distributed AP systems:

Requirement for ML	Autonomous AP	Fi-Wi Concentrator
Global CSI visibility	❌ Each AP sees only local channel; no cross-AP interference data	✅ Concentrator receives CSI from all RRHs; computes spatial correlation matrix
Cross-AP coordination state	❌ Cannot observe other APs' band selection, power levels, or scheduling decisions	✅ Centralized scheduler has complete visibility of all RRH configurations and decisions
Queue observability	❌ Queue depth hidden in firmware; sojourn time not exposed	✅ Centralized queuing with microsecond-resolution timestamps
Deterministic replay	❌ Cannot reproduce exact RF conditions; firmware decisions opaque	✅ Complete event log enables replay of scheduling decisions and outcomes
Inference distance	❌ High (5-10 steps from cause to transport-layer effect)	✅ Low (1-2 steps; queue → schedule → TX outcome directly linked)

This observability gap is not a vendor implementation issue—it is an architectural limitation. Autonomous APs cannot generate high-quality training labels without queue observability.

16. Concentrator Fast Path: DPDK, DMA, and Queue Determinism

The preceding sections established the architecture of the Fi-Wi concentrator: centralized packet memory (Section 4.4), group queues as the sole AQM bottleneck (Section 4.3), microsecond timestamps written into the Fi-Wi shim header (Section 4.2), and ML-driven MCS selection running continuously against that centralized data (Section 15). This section explains how the concentrator executes that pipeline with the determinism the architecture requires — maintaining a single observable bottleneck per airtime domain, applying ECN marks at the right moment, and keeping the RRH free of scheduling logic.

16.1 Why a Kernel-Bypass Data Plane

The Fi-Wi concentrator's latency and determinism targets strongly favor a kernel-bypass data plane. A conventional interrupt-driven kernel path would reintroduce jitter at exactly the point where the architecture is trying to remove it.

L4S requires ECN marks to be applied at the group queue on the same time scale as a single 802.11 TXOP. The Linux kernel's softirq-based packet path introduces interrupt coalescing and scheduler contention that accumulates across bursts. More fundamentally: every packet that transits the kernel stack competes with arbitrary OS activity for CPU time. The queue depth is not directly visible to userspace without a syscall; the marking decision cannot be co-located with the queue measurement in the same cache line.

Fi-Wi's concentrator data plane therefore runs via DPDK (Data Plane Development Kit): tight busy-poll loops on dedicated cores, with no interrupt-driven jitter. All packet operations — receive, classify, AQM mark, forward — execute in a cache-resident loop that preserves the single-bottleneck, fully-observable queue structure that the rest of the architecture depends on.

16.2 The Memory Model: IOMMU, VFIO, and Hugepages

DPDK allocates all packet buffers (mbufs) from hugepages, eliminating TLB misses during packet processing. Each airtime domain's group queue is a logically contiguous region within this space. The pool is allocated once at startup; no per-packet memory allocation occurs on the fast path.

Each SFP+ NIC is bound to the vfio-pci driver. The system IOMMU enforces DMA isolation: a card can only reach the memory regions explicitly registered with it at startup. This gives the concentrator two properties simultaneously:

16.3 Airtime Domains as Hardware Queue Partitions

DPDK exposes each NIC's hardware receive queues independently. Fi-Wi uses this to achieve a direct, lockless mapping from PCIe port and queue index to airtime domain — the same logical grouping described in Section 6. Each lcore owns a fixed set of (port, queue) pairs. Because ownership is exclusive, there are no locks on the fast path and no shared state between lcores during steady-state forwarding.

16.4 The L4S Marking Loop

The AQM marking step is deliberately minimal. The DPDK data plane does not run a full queue scheduler — that is the outer control loop's responsibility (Section 5). The inner loop does one thing: read sojourn time from the shim header (Section 4.2) and set the ECN CE codepoint if the threshold is exceeded.

Fast-Path Property	Kernel Stack	Fi-Wi DPDK Pipeline
Receive and Queue Observability
Interrupt model	Hardware IRQ → `softirq` → NAPI poll; coalescing adds jitter	No interrupts. Dedicated lcore polls hardware queue register directly.
Queue depth visibility	Visible inside kernel only; userspace access requires syscall	Directly readable by AQM loop in same CPU cache line as packet pointer
Buffer allocation	Per-packet `skb` allocation from kernel slab	Pre-allocated mbuf pool; zero allocation on fast path
AQM and Forwarding
ECN marking timing	Marked in kernel `qdisc`; subject to scheduling lag	Marked in polling loop body; co-located with queue measurement
Forwarding lookup	Routing table + netfilter traversal	(port, queue_id) → group queue index; O(1), cache-hot
Packet copy	Typically 1–2 copies through socket buffer chain	Zero copies; mbuf pointer passed through the pipeline
Transmit
IOMMU interaction	Kernel maps and unmaps DMA regions per packet	IOMMU mapping established once at pool creation; static thereafter

Because t_ingress is written by the same lcore at enqueue, no cross-core communication is needed to compute sojourn time at dequeue. The marking decision is local to the polling thread. This is what Section 4.3 means when it says AQM runs "exactly where the integrator lives": the integrator is the group queue, the group queue is an mbuf ring in hugepage memory, and the marking loop touches that ring on every poll cadence with no additional indirection.

16.5 Fault Isolation via IOMMU Groups

In a multi-card concentrator, each SFP+ card appears in its own IOMMU group, which means each card can be bound to VFIO independently and the IOMMU enforces that one card's DMA cannot reach another card's memory regions. In a deployment with multiple SFP+ cards, the IOMMU topology provides natural fault isolation at the card boundary: a PCIe error or runaway DMA event from one RRH is contained within its card's group and cannot corrupt the packet memory of an adjacent airtime domain. This is a hardware guarantee, not a software policy.

16.6 What DPDK Does and Does Not Solve

The kernel-bypass data plane is not a complexity cost — it is the mechanism that justifies the RRH's simplicity. Because the concentrator runs a deterministic, observable pipeline that applies AQM, tracks sojourn time, and manages all descriptor posting without OS intervention, the RRH never needs to make a queuing or scheduling decision. It remains a pure DMA client, exactly as the silicon cost argument in Section 4.4 requires.

Incumbent distributed APs have no equivalent. Because each AP operates autonomously, it must run its own Linux network stack, its own qdisc, and its own firmware scheduler. The CPU carrying that stack is the dominant gate cost per RRH (Section 4.4, silicon cost table). A centralized DPDK pipeline eliminates that requirement across every RRH simultaneously — not by optimizing the AP implementation, but by removing the architectural condition that forces the CPU to exist there in the first place.

That said, DPDK solves a specific problem: it gives the concentrator a deterministic, observable, zero-copy execution path in which queue state, ECN marking, and packet steering remain under unified software control. It does not solve the radio-side interface. Per-packet MCS selection, EDCA parameter control, and TX-outcome metadata from the Wi-Fi silicon remain the next required interface boundary — the point at which concentrator intelligence must reach into the RRH to close the control loop. DPDK is the precondition; radio-side per-packet programmability is what completes it.

16.7 DualPI2 Baseline: Control Law and Queue Structure

Section 16.4 described the minimal ECN marking step — reading queue state and applying a CE mark in the fast path. That sketch is sufficient to illustrate where marking occurs, but it elides the control structure that makes L4S coexistence with legacy traffic work: the dual-queue coupled AQM defined in RFC 9332.

This section defines the baseline DualPI2 control law as it would be realized inside the DPDK polling loop. Fi-Wi preserves this dual-queue topology, coupling mechanism, and PI-based control structure, but Section 17 replaces the underlying congestion signal with Airtime Debt (D_i), grounding the controller in predicted wireless service time rather than raw queue occupancy.

16.7.1 The Two Queues

Each airtime domain maintains two logically independent mbuf rings in the concentrator's hugepage pool: an L4S queue for scalable congestion-control flows (senders marking with ECT(1)), and a Classic queue for legacy RFC 3168 flows and unmarked traffic. Classification happens at ingress on the fast path, before the packet is enqueued, and costs a single bitfield check on the IP ECN field:

Both queues drain toward the same transmit burst for that airtime domain. The scheduler services the L4S queue with a strict low-latency budget and the Classic queue at a rate that saturates the domain's aggregate share, matching the DualPI2 service model from RFC 9332.

16.7.2 The Coupling Mechanism

The key property of DualPI2 is that the two queues are not independent. The Classic queue's drop probability p_c — computed by a PI controller from a congestion signal representing pressure at the shared bottleneck — also governs the L4S queue's ECN marking probability via a coupling factor k (default 2 in the Linux sch_dualpi2 reference implementation).

In a conventional queue-based implementation, signal_classic would be an EWMA of Classic queue depth. In Fi-Wi, that queue-derived signal is replaced as the PI controller input by Airtime Debt (D_i), a forward estimate of wireless service time. The DualPI2 control law, coupling mechanism, and dual-queue topology remain unchanged; only the input signal changes.

Queue depth is a lagging indicator in Wi-Fi because contention, retries, and variable PHY rates consume airtime without necessarily appearing in buffer occupancy. Airtime Debt provides a forward-looking signal that better matches the true wireless bottleneck while preserving the DualPI2 coexistence structure required for L4S and Classic traffic to share the medium.

16.7.3 Per-Domain State and the fiwi_update Interface

Each airtime domain carries its own DualPI2 state alongside the fiwi_rrh_state struct (Section 17.5). Because each lcore owns a fixed set of domains exclusively (Section 16.8), this state is never shared across cores — no locks, no atomics, no cache-line bouncing on the fast path.

The telemetry path (Section 17.8) delivers ground-truth airtime measurements back to the lcore via a lockless ring carrying fiwi_update objects. The struct is defined here because it originates in the DPDK fast-path layer and is consumed by it; Section 17.8 populates it from Netlink/vendor telemetry events:

16.8 Multi-RRH lcore Topology and Control Ownership

The Umber concentrator runs on a workstation-class host with a Threadripper PRO processor and multiple PCIe-connected RRHs. This section describes how DPDK lcore assignments map onto that hardware topology to preserve cache locality, single-writer semantics, and deterministic fast-path execution.

Each lcore owns both the DualPI2 control state (Section 16.7) and the Airtime Debt estimator (Section 17) for its assigned RRHs. This ensures that congestion estimation, scheduling, and ECN marking operate within a single execution context.

16.8.1 RRH Assignment

16.8.2 Control and Data Flow

RRH Range	Assigned lcore	Airtime Domains
0–3	lcore 2	domains 0–3
4–7	lcore 4	domains 4–7
8–11	lcore 6	domains 8–11
12–15	lcore 8	domains 12–15
16–19	lcore 10	domains 16–19
20–23	lcore 12	domains 20–23

Each RRH lcore applies its per-domain DualPI2 loop as described in Section 16.7, with Airtime Debt (D_i) serving as the PI controller input in place of queue depth. This presents a single, airtime-grounded congestion signal per domain to the L4S control loop.

Downlink traffic is classified at ingress and directed to the appropriate airtime domain. The owning lcore performs scheduling, ECN marking, and transmission. Uplink traffic follows the reverse path toward the WAN interface.

Because each lcore exclusively owns its RRHs and associated Airtime Debt state, congestion estimation, scheduling, and ECN marking operate without cross-core coordination. This preserves deterministic fast-path behavior.

17. Airtime-Assisted ECN: Airtime Debt as the Congestion Signal

Fi-Wi does not infer congestion from queue depth alone. The bottleneck is the wireless medium, and the relevant state variable is the time required to successfully transmit packets over that medium. The system replaces the queue sojourn-time inputs of traditional PI² controllers with Airtime Debt (D_i), converting a stochastic medium into a controlled service process.

17.1 The Bottleneck is Airtime, Not a Queue

In traditional L4S systems, ECN marking is derived from queue sojourn time, which assumes a stationary service rate. These assumptions fail in Wi-Fi because service time varies per client based on PHY rates, contention, and retries. Fi-Wi replaces backward-looking buffer metrics with a forward model of wireless service time. The Concentrator maintains this model continuously and makes scheduling decisions on predicted service outcomes, not observed queue growth. This approach provides the AQM with a signal that has a more stationary distribution than raw queue depth over a variable-rate medium, improving marking coherence and L4S stability.

17.2 Airtime Debt Model (Per RRH)

For each RRH (i), the Concentrator maintains a real-time Airtime Debt (D_i):

        Di = Ai + Ci + Ri
      

A_i (Backlog Airtime): Predicted transmit duration for staged and in-flight packets.
C_i (Contention Penalty): Predicted medium access delay (µs), informed by driver telemetry and controlled via EDCA scheduling (Section 4.1.4).
R_i (Retry Penalty): Predicted retransmission overhead (µs) derived from recent PER history (Section 15.4).

17.3 Measuring Ground Truth (Hardware-Path-to-Status)

The "Ground Truth" for airtime consumption is measured as the interval from descriptor posting into the hardware transmit path to TX Status (hardware completion signal via driver/vendor-specific telemetry events such as mt76 TX status reports). This interval captures the full service duration, including the full wait for TXOP eligibility (AIFS + backoff), aggregation delay, and all hardware-level retransmission attempts.

17.4 Predicted Sojourn Time (S_i)

For any packet, the Predicted Sojourn Time (S_i) is a forward estimate of delivery time:

        Si(packet) = Di + Tservice(packet)
      

The T_service calculation is decomposed into: T_agg (aggregation hold time) + T_phy (modulation time at current MCS) + T_retry (statistical retry overhead). This estimate is packet- and client-specific; it is not a constant service quantum.

17.5 Implementation: DPDK Fast Path State

The Concentrator tracks RRH state in hugepage-backed memory. The DPDK lcore is the sole writer of fiwi_rrh_state; telemetry updates are applied via per-RRH lockless ring buffers to preserve single-writer semantics and microsecond-level determinism.

struct __rte_cache_aligned fiwi_rrh_state {
    uint32_t rrh_id;
    uint64_t D_i;            /* Total airtime debt (A+C+R) */
    
    /* Component Estimates (microseconds) */
    uint64_t A_i;            /* Total scheduled airtime (queued + in-flight) */
    uint32_t C_i;            /* Estimated contention delay */
    uint32_t R_i;            /* Estimated retry penalty */

    /* Feedback & Synchronization */
    uint64_t last_update_us;     /* Timestamp of last lcore application */
    uint64_t last_tx_status_us;  /* TSC of last hardware completion */
    uint32_t moving_avg_per;     /* Recent PER (Section 15.4) */
};

D_i is recomputed in the DPDK fast path after each update to A_i, C_i, or R_i. The loop updates A_i when packets are assigned to an RRH and decrements it upon TX completion using telemetry feedback.

17.6 Authoritative Congestion Signaling

Airtime Debt replaces physical queue depth as the authoritative input for the Dual-Queue AQM, providing a single, authoritative congestion signal across all RRHs without relying on a shared physical buffer.

L4S Marking: Applied if S_i > T_low. This bypasses traditional sojourn measurements to signal congestion at the true wireless bottleneck.
Classic Drop: Airtime Debt (D_i) replaces queue depth as the input to the PI controller defined in Section 16.7. This preserves the Dual-Queue AQM structure while grounding the control signal in predicted wireless service time rather than buffer occupancy.

17.7 Slow-Path Observability

While D_i provides fast-path control, the system monitors Airtime Utilization (U_air = ΔTX_DURATION / Δt) as a slow-path observability metric. This metric is used to identify external interference patterns and long-term capacity shifts in the airtime domain, calibrating the confidence weights applied to the C_i and R_i estimators.

17.8 Telemetry Feedback: Netlink Calibration

The following logic processes TX_STATUS events from the mt76 driver. Completion data is retrieved from a pre-allocated mempool and posted to a per-RRH lockless ring to reconcile state without lcore contention.

/* Telemetry Path (Netlink Callback) */
static int fiwi_handle_mt76_telemetry(struct nl_msg *msg, void *arg) {
    struct nlattr *attrs[MT76_ATTR_MAX + 1];
    nla_parse(attrs, MT76_ATTR_MAX, genlmsg_attrdata(nlmsg_data(nlmsg_hdr(msg)), 0),
              genlmsg_attrlen(nlmsg_data(nlmsg_hdr(msg)), 0), NULL);

    if (!attrs[MT76_ATTR_TX_DURATION] || !attrs[MT76_ATTR_RRH_ID])
        return NL_SKIP;

    uint32_t rrh_id = nla_get_u32(attrs[MT76_ATTR_RRH_ID]);
    if (rrh_id >= FIWI_MAX_RRHS) return NL_SKIP;

    struct fiwi_update *update;
    if (rte_mempool_get(fiwi_update_pool, (void**)&update) < 0) return NL_SKIP;

    update->type = AIRTIME_RECONCILE;
    update->rrh_id = rrh_id;
    update->actual_us = nla_get_u64(attrs[MT76_ATTR_TX_DURATION]);
    update->retry_us = nla_get_u32(attrs[MT76_ATTR_RETRY_DURATION]);
    update->expected_us = estimate_service_time(msg); 

    rte_ring_enqueue(rrh_update_rings[rrh_id], update);
    return NL_PROCEED;
}

17.8.1 Telemetry Application (DPDK lcore)

The DPDK lcore closes the control loop by draining the update ring. It decrements the backlog and calibrates penalties to ensure the Airtime Debt remains an accurate representation of physical medium pressure.

/* DPDK lcore: apply telemetry updates */
static inline void
fiwi_apply_updates(struct fiwi_rrh_state *rrh, struct rte_ring *ring)
{
    struct fiwi_update *upd;
    while (rte_ring_dequeue(ring, (void**)&upd) == 0) {
        /* 1. Discharge processed backlog */
        rrh->A_i = (rrh->A_i > upd->actual_us) ? (rrh->A_i - upd->actual_us) : 0;

        /* 2. Update contention estimate (drift from expected modulation time) */
        uint32_t drift = (upd->actual_us > (upd->expected_us + upd->retry_us)) ? 
                         (upd->actual_us - upd->expected_us - upd->retry_us) : 0;
        rrh->C_i = (rrh->C_i * 7 + drift) >> 3;

        /* 3. Update retry penalty */
        rrh->R_i = (rrh->R_i * 7 + upd->retry_us) >> 3;

        /* 4. Recompute total Airtime Debt (D_i) */
        rrh->D_i = rrh->A_i + rrh->C_i + rrh->R_i;

        rrh->last_tx_status_us = rte_get_tsc_cycles();
        rte_mempool_put(fiwi_update_pool, upd);
    }
}

17.9 Visualization: The Airtime Debt Control Loop

Figure 17-1: Airtime Debt Control Loop showing Forward Service Model and Ground Truth Calibration

Figure 17-1: The Fi-Wi recursive control loop for stabilizing stochastic wireless service.

Diagram Overview: Closing the Feedback Loop

Figure 17-1 synthesizes the technical components of the Airtime Debt model into a continuous functional loop. The architecture separates the Speculative Forward Path (Fast Path) from the Calibrated Feedback Path (Telemetry Path).

1. Forward Service Model (Prediction): Every ingress packet triggers a per-STA calculation of T_service. This is not a global constant; it is a client-specific sum of aggregation hold time (T_agg), PHY modulation time (T_phy), and predicted retry overhead (T_retry) based on that STA's specific RF context.

2. Debt Update & Marking Decision: The predicted T_service is added to the RRH's A_i (Backlog). If the resulting Predicted Sojourn Time (S_i) exceeds T_low, an ECN CE mark is applied immediately in the DPDK fast path. This provides the "Virtual Backpressure" that stabilizes L4S senders.

3. Ground Truth Calibration (Correction): As the packet is dispatched via DMA, the hardware records the precise interval from descriptor posting into the hardware transmit path to TX Status completion. The Telemetry Path calculates the Drift—the delta between the forward prediction and physical reality.

4. Estimator Refinement: This drift is fed back into the EWMA filters for C_i (Contention) and R_i (Retries). This ensures that subsequent predictions for the same STA or RRH domain are corrected for changing medium pressure, effectively regularizing the stochastic nature of the 802.11 medium.

18. Summary

The core idea of Umber’s Fi-Wi architecture is to make a building full of Wi-Fi radios behave like a large number of predictable, low-latency, cellularized bottlenecks (often cell-per-room) that integrate cleanly with L4S, and to avoid Wi-Fi collapse in the regime that matters most for users: tail latency.

We do that by:

Time-synchronizing RRHs and concentrator to microsecond accuracy
Centralizing packet memory and queues in the concentrator
Adding a Fi-Wi shim header with timestamps and queue metadata
Doing all L4S AQM/ECN marking at group queues, where the wireless delay is actually created
Using shared state to dynamically group RRHs into airtime domains (cells), considering both interference and spatial-stream / eigenvector structure
Using multiple RRHs per room and per STA as an active redundant RF set, analogous to MLO but building-wide
Leveraging CSI and learning to predict capacity, interference, and collapse risk
Enabling, where appropriate, coordinated multi-RRH operation and dynamic point selection (Section 9)
Keeping RRHs within a strict power/thermal envelope, while using PCIe Gen3 x1 over fiber to connect them as low-latency DMA clients of central packet memory

Compared to a building filled with independent APs, Fi-Wi provides:

Lower and more predictable latency, especially in the tails
More stable and effective L4S behavior
A better aggregation–latency tradeoff
Centralized, cellularized control with shared state that scales across many RRHs and rooms
RF-layer redundancy, richer spatial streams, and fast failover with a clean single-bottleneck view for L4S
A practical path to cell-per-room density with manageable power, thermals, and hardware complexity

Appendix A: 802.11 Backoff Timing & Collapse Dynamics

This appendix explains the precise behavior of the 802.11 CSMA/CA backoff algorithm, why the freeze/resume mechanics create strong nonlinearities under load, and how this drives the collapse behavior discussed in Sections 2 and 6. We also include reference diagrams, accurate pseudocode, and probability scaling that shows why birthday-paradox collisions appear long before PHY saturation.

A.1 Overview

The 802.11 MAC is built around two core mechanisms:

Carrier Sensing (CSMA/CA) — sense-before-transmit.
Random Backoff — avoid collisions by randomized slot countdown.

These mechanisms interact in a way that works beautifully for light to moderate station counts, but begins to break down sharply once multiple stations become backlogged. Collapse is not a "bug"; it is the mathematically expected outcome under high concurrency.

A.2 Backoff Decrements Only During Idle SlotTime

When a station has a frame to send, it chooses a random integer:

B ← Uniform[0, CW]

where CW is the contention window. The counter decrements only when:

The PHY senses the channel idle for the entire SlotTime (9 µs typical).
NAV = 0 (no virtual carrier-sense in effect).
The station has already observed one full AIFS interval.

If any of these conditions break during a SlotTime boundary, backoff does not decrement.

Diagram A-A — Backoff Countdown with Idle Slots and Freezes

Time →  ───────────────────────────────────────────────────────────────────────→

Channel:    Busy TXOP      Idle slot     Idle slot     Busy TXOP      Idle ...
           ────────────┐  ┌─────────┐   ┌─────────┐  ┌───────────┐
                       │  │ slot OK │   │ slot OK │  │collision  │
                       └──┘         └───┘         └──────────────┘

Backoff B:   [frozen]        B:=B-1       B:=B-2        [frozen]       B:=B-3

This "idle-slot-only" decrement rule is the source of nonlinear timing behavior.

A.3 Freeze Conditions: Physical Busy + NAV Busy

The backoff counter freezes immediately under either condition:

PHY busy: Energy detect (ED) threshold exceeded — another station is transmitting.
NAV busy: The local station believes a scheduled TXOP is ongoing (Duration field).

NAV counts down in microseconds, not slot units, so a NAV may span dozens or hundreds of SlotTimes, creating long frozen periods.

Diagram A-B — NAV Freezes Backoff for Entire Duration

Frame overheard with Duration=480µs
     NAV := 480 µs  ─────────────────────────────────────────────▶ 0 µs

Backoff:
   Frozen until NAV==0
   Then: AIFS idle interval → first idle SlotTime → resume B countdown

A.4 Full Backoff State Machine

The following pseudocode describes the real 802.11 backoff and retry machine:

# Variables
B   = random integer in [0, CW]
CW  = CWmin initially, doubled on failures
NAV = virtual carrier sense (µs timer)
Slot = 9 microseconds (typical)
AIFS = access category-specific inter-frame space

while True:

    wait_until( medium_idle() and NAV == 0 )
    wait(AIFS)  # must see idle for entire AIFS

    # Backoff countdown
    while B > 0:

        if medium_idle() and NAV == 0:
            wait(Slot)
            if medium_idle() and NAV == 0:
                B -= 1      # decrement only if entire slot was idle
        else:
            # Freeze B until another idle AIFS appears
            wait_until( medium_idle() and NAV == 0 )
            wait(AIFS)

    # Backoff fully expired, attempt TX
    transmit()

    if ack_received():
        CW = CWmin
        B = random(0, CW)
    else:
        CW = min(2 * CW, CWmax)
        B = random(0, CW)

The critical detail: multiple stations freeze and resume their counters in lock-step after every long TXOP or NAV, making collisions statistically inevitable as station count grows.

A.5 Collision Probability and the Birthday Paradox

Each station independently picks a backoff slot in [0, CW]. The probability that no two stations choose the same slot is:

P(no collision) = (CW+1)! / [(CW+1 - n)! · (CW+1)^n]

where n = number of active contenders. Therefore:

Even with CW=15 (CWmin for AC_BE), collisions explode once n > 6–8.
Increasing CW reduces collisions but explodes delay variance.
Freeze/resume sync makes large groups of stations “compete” on the same few idle slots.

Diagram A-C — Collision Probability vs. Number of Stations

Stations (n) →   4     6      8      10     12     16
--------------------------------------------------------
P(collision)   ~12%   30%    48%    65%    78%    >90%

(CWmin = 15)

This is the MAC-level reason collapse begins long before PHY capacity is reached.

A.6 Why Collapse Appears as 2–3 ms TXOP Tails

Once collisions become frequent:

Stations reach retry limits and enlarge CW exponentially.
Longer CW → longer idle-time waits → bigger aggregates formed.
Bigger aggregates → longer TXOPs (1.5–3 ms typical in collapse).
Long TXOPs → other stations see multi-ms starvation → tail latency balloons.

Diagram A-D — TXOP Length as Collapse Indicator

Healthy:    T50 ≈ 200–500 µs,   T95 < 0.8 ms,    T99 < 1.2 ms
Degraded:   T95 = 1–2 ms,       T99 = 2–3 ms
Collapsed:  T95 > 2 ms AND      T99 ≥ 3 ms (dominant channel monopolization)

A single 3 ms TXOP already violates the bottleneck-delay budget required by L4S (≈250–300 µs). With multiple stations taking such TXOPs, service gaps can reach 10–50 ms for unlucky flows.

A.7 Multi-Station Synchronization Example

The following diagram illustrates how multiple stations become phase-aligned:

Time →  ────────────────────────────────────────────────────────────────→

TXOP1 by STA-A:   ────────────────
NAV for others:   ──────────────── (all B frozen)

After NAV expires:
All stations wait AIFS → begin countdown
Slot 1:  B_A=2, B_B=4, B_C=2
Slot 2:  B_A=1, B_C=1
Slot 3:  B_A=0  ,  B_C=0   → simultaneous transmit → collision

This synchronization is why the birthday paradox applies so strongly in Wi-Fi.

A.8 Why Fi-Wi Breaks the Cycle

Fi-Wi removes the “every station fends for itself” randomness by:

Centralizing queueing and airtime allocation.
Eliminating station-driven EDCA decisions.
Coordinating RRH TXOP lengths and schedules.
Ensuring no two RRHs “race” via random backoff.
Driving TXOPs toward safe, L4S-friendly durations (≈250 µs).

Thus Fi-Wi converts Wi-Fi from a chaotic CSMA/CA system into a scheduled, low-latency cellular MAC.

Appendix B: Channel State Information (CSI) and Learning-Enhanced Fi-Wi

This appendix describes how Fi-Wi can use Channel State Information (CSI) from each RRH, together with learning models (e.g. LSTM or TCN), to improve grouping, scheduling, redundancy, and control beyond what is possible with queue-based feedback alone.

B.1 What CSI provides in a Fi-Wi context

Concept: What is CSI?
Imagine shouting in a complex room. You hear echoes bouncing off walls, furniture, and people. If you analyze those echoes, you can map the environment.

In Wi-Fi, Channel State Information (CSI) is that map. It describes exactly how the radio wave traveled from the transmitter to the receiver—including all the bounces (multipath), fading, and phase shifts caused by the physical environment.

RSSI (Signal Strength): Tells you how loud the signal is. (One number).
CSI (Channel State): Tells you the shape of the signal distortion. (A matrix of complex numbers).

Traditional APs throw this data away after decoding the packet. Fi-Wi sends it to the Concentrator, allowing the system to "see" the RF environment and mathematically calculate how to steer beams or combine signals.

Wi-Fi Sensing: Because physical objects reflect radio waves, any movement in the room changes the CSI pattern. By monitoring these changes over time, Fi-Wi can detect presence—such as a person walking or a pet breathing—turning the network into a ubiquitous sensor without cameras.

Modern 802.11 chipsets can export CSI per subcarrier or per resource unit: complex-valued estimates of the channel between an RRH and a station (STA). In a Fi-Wi deployment, each RRH periodically reports:

Per-subcarrier CSI magnitude/phase (or compressed variants)
Signal-to-noise ratio (SNR), RSSI, and noise floor estimates
Selected MCS, PHY rate, and per-TXOP success / retry / PER statistics
Spatial-stream usage and beamforming feedback (if applicable)

Thanks to centralized time synchronization and packet memory, the concentrator can align CSI reports with:

Specific flows and STAs
Specific group queues and airtime domains
Queue states and ECN marking decisions at those times

This gives Fi-Wi a rich per-domain, per-STA time series:

CSI(t) — the RF channel state (including spatial eigenstructure)
MAC(t) — TXOP, MCS, retries, aggregation size, spatial-stream usage
Queue(t) — group queue depth, marking probability p_k
Outcome(t) — achieved throughput, delay, PER, “collapse events”

B.2 What we want to predict

Using this data, Fi-Wi can learn models to help answer questions such as:

Short-horizon capacity prediction: Given recent CSI + MAC statistics, what is the effective service rate C_eff for this RRH or airtime group in the next Δt?
Interference and grouping: Which RRHs are likely to interfere strongly in the next Δt (e.g. due to mobility, reflections, or external interferers)? Which ones share useful eigenmodes?
Scheduling decisions: For a given flow/STA, which RRH (or band/channel) and which spatial-stream configuration is likely to give the best service (throughput vs. delay) in the next window?
Collapse risk: Are we heading into a region where adding more load will cause retries, PER, and tail latency to spike?

These predictions can feed directly into:

Queue grouping and re-grouping (airtime domains)
Per-group AQM parameter adaptation (e.g., queue target, gains)
RRH selection, channel selection, and per-STA scheduling
Active redundancy policies (which RRHs should listen or transmit in parallel)
Spatial-stream / eigenmode choices within each airtime domain

B.3 Example model: LSTM / TCN

One reasonable approach is to use a sequence model such as an LSTM or Temporal Convolutional Network (TCN) per airtime domain:

Input features (per timestep):
  - queue depth q_k
  - marking probability p_k
  - throughput, PER, retries
  - per-RRH CSI summary (e.g. dominant eigenvalues/eigenvectors)
  - beacon power settings, channel, bandwidth

Outputs:
  - predicted effective capacity C_eff,k+1
  - predicted collapse risk score
  - recommended group reconfiguration / beacon adjustments (optional)

A higher-level policy layer then uses these predictions to:

Re-balance load across domains
Merge or split airtime domains
Adjust beacon powers to reshape cell edges
Tweak PI2 / L4S parameters per domain

The key point is that Fi-Wi has access to the joint state across all RRHs—queues, CSI, MAC outcomes, and beacon configuration—so learning can be done on a true building-scale view rather than a per-AP snippet.

B.4 The Non-Linear Control Policy (Feature Vectors)

While the PI² controller (Section 5.2) provides a robust baseline using linear control theory, the wireless medium is inherently non-linear. A small drop in SNR can cause a discrete, non-linear step-down in MCS, cutting capacity by half in microseconds. A linear controller often reacts too slowly to these step-changes.

Because the Concentrator terminates both the MAC (Inner Loop) and L4S (Outer Loop), it possesses a complete, global view of the system state. This allows Fi-Wi to implement a Non-Linear Marking Signal derived from a rich real-time feature vector:

Feature Vector x(t) = [
   MCS_t,          // Current Modulation (Capacity potential)
   PHY_Rate_t,     // Raw drain rate
   RTT_outer,      // End-to-end latency (Sojourn + Flight)
   Q_depth_t,      // Current backlog
   d_arrival/dt    // Arrival rate gradient (ARM Policer)
]

Optimization Objective: Efficiency vs. Latency
The system uses this vector to solve the fundamental Wi-Fi trade-off: Aggregation Efficiency vs. Serialized Latency.

The Physics: PHY efficiency increases with aggregation size (amortizing the preamble/SIFS overhead). However, larger aggregates typically imply deeper queues and higher latency.
The Logic: By correlating MCS with Arrival Rate, the control policy can dynamically adjust the marking threshold.
- High MCS Scenario: The policy knows the "pipe" is wide. It relaxes the marking threshold, allowing the "Inner Integrator" (Section 5.3) to accumulate more packets. This naturally builds larger, more efficient aggregates (e.g., 64 MPDUs) without violating the L4S latency budget because they drain instantly.
- Low MCS Scenario: The policy knows the "pipe" is narrow. It aggressively marks via the ARM Policer, forcing the sender to slow down. This keeps the aggregate size small (e.g., 4 MPDUs), preventing the massive "sawtooth" latency spikes that occur when sending large aggregates at low data rates.

This creates a Non-Linear Marking Signal that optimizes Throughput per Microsecond of Latency, rather than simply targeting a fixed queue depth.

Appendix C: Latency Hiding via Scatter-Gather DMA

Early architectural models of C-RAN often assumed a "Store-and-Forward" approach, where full packets must be buffered at the edge to meet timing. Fi-Wi eliminates this inefficiency by leveraging the natural physics of the 802.11 air interface. We utilize a Scatter-Gather DMA engine with Preamble Hiding to enable a "Thin RRH" design with minimal local SRAM.

C.1 The "Preamble Shield" Physics

The critical timing constraint in Wi-Fi is the transition from "Decision to Transmit" to "Energy on Air." However, the 802.11 PHY does not transmit user data immediately. Every transmission begins with a PHY Preamble (PLCP) and MAC Headers.

Time-Domain View of a Transmission Start: T=0 µs T=5 µs T=24 µs (approx) | TX Trigger | | | | Preamble & Headers | Payload Data Starts... [ MAC Logic ]->[/////////////////////////][......................] ^ ^ | | Source: Local RRH SRAM Source: Host Concentrator DRAM (Instant Access) (Fetched via Fiber)

The Insight: The transmission of the Preamble and Headers takes roughly 20–40 µs (depending on PHY generation). The round-trip time to fetch payload data over 100m of PCIe-over-Fiber is roughly 2–5 µs.

Consequently, the fetch latency is completely "hidden" behind the transmission of the headers. The payload data arrives at the RRH's small FIFO well before the PHY is ready to modulate it.

C.2 Scatter-Gather Architecture

Instead of a large packet buffer, the Fi-Wi RRH implements a Scatter-Gather DMA engine that composes frames on the fly from two distinct memory regions:

Template RAM (Local RRH SRAM): Stores 802.11 MAC headers, PLCP headers, and delimiter signatures. This memory is small (< 16 KB), fast, and populated by the Concentrator during the descriptor posting phase.
Payload Buffer (Remote Concentrator DRAM): Stores the actual 802.3 Ethernet payloads. These remain in the host server's memory until the exact moment of transmission.

C.3 The Transmit Sequence

Descriptor Posting: The Concentrator posts a descriptor to the RRH. This descriptor points to the header in Local RAM and the payload in Remote DRAM.
Contention: The RRH MAC performs EDCA backoff. No data is moved during this phase.
TX Trigger: When backoff reaches zero, the MAC immediately begins transmitting the Preamble from Local RAM.
Just-in-Time Fetch: Simultaneously with the Preamble start, the DMA engine issues a read request to the Concentrator for the payload data.
Cut-Through: Data returns from the fiber, flows into a small speed-matching FIFO (e.g., 4 KB), and flows directly into the PHY serialization path immediately following the header.

C.4 Solving the Retry Timing (SIFS)

A common objection to C-RAN is the SIFS deadline (16 µs) required for retries. If a transmission fails, the station must retransmit immediately.

With Scatter-Gather, the RRH does not need to buffer the packet for retries. If a NACK occurs, the MAC simply resets the Scatter-Gather engine. It re-transmits the Preamble (from Local RAM) while re-issuing the DMA fetch (from Remote RAM). Because the fiber latency (5 µs) is significantly shorter than the SIFS + Preamble duration, the data again arrives in time.

C.5 Architectural Benefits

Zero Copy: Packets are never copied CPU-to-Buffer or Buffer-to-Buffer. They move from Host DRAM to Air.
Minimal SRAM: The RRH does not need MBs of RAM to buffer aggregates. It only needs KB for headers and FIFOs. This significantly reduces leakage power and die cost.
Infinite Retries: The system is not limited by local buffer size for retries; it relies on the massive capacity of Host DRAM.

Appendix D: 802.11ax/be Features and Fi-Wi Integration

Modern Wi-Fi standards — particularly 802.11ax (Wi-Fi 6/6E) and 802.11be (Wi-Fi 7) — introduce features that appear to address some of the same problems as Fi-Wi: uplink scheduling, spatial reuse, and multi-AP coordination. This appendix clarifies how these features relate to Fi-Wi's architecture, where they're complementary, and why they don't eliminate the need for Fi-Wi's centralized data-plane approach.

Key takeaway: 802.11ax/be features like trigger frames and multi-AP coordination are valuable enhancements that Fi-Wi can leverage when client support is available, but they operate at a different architectural level (per-AP MAC features vs. building-scale data-plane unification) and cannot replace Fi-Wi's core innovations: centralized queues, shared state, L4S marking coordination, and dynamic RF grouping across the entire building.

D.1 Trigger Frames and Uplink Scheduling

802.11ax introduced trigger frames (TF) to enable centralized uplink scheduling. Instead of clients contending for the channel using stochastic EDCA backoff, the AP sends a trigger frame that grants specific clients permission to transmit on specific OFDMA resource units (RUs) or spatial streams at a specific time.

What trigger frames provide:

Uplink OFDMA: Multiple clients can transmit simultaneously on different frequency resource units within the same channel, controlled by the AP's trigger frame.
Uplink MU-MIMO: Multiple clients can transmit simultaneously on different spatial streams, again coordinated by the AP.
Deterministic uplink airtime: The AP controls which clients transmit when, reducing contention and collision probability on the uplink.

How trigger frames align with Fi-Wi:

Trigger frames match Fi-Wi's philosophy of centralized scheduling rather than distributed contention. In a Fi-Wi deployment where RRHs support 802.11ax and clients support uplink OFDMA/MU-MIMO, the concentrator can:

Generate trigger frames via RRHs to schedule uplink transmissions from multiple STAs
Coordinate uplink RU allocation across an airtime domain to minimize collision and maximize spatial reuse
Use shared state (CSI, queue levels, traffic patterns) to make smarter trigger scheduling decisions than an isolated AP could
Treat uplink trigger-based transmissions as part of the same controlled airtime resource managed by the group queue

Reality check — client support in 2025:

While 802.11ax was ratified in 2019, uplink OFDMA support remains inconsistent. Crucially, trigger frames only control 802.11ax/be clients; legacy devices (iPhone 11, older IoT) are invisible to this schedule. These legacy clients cannot parse the trigger, so they continue to contend via random EDCA, acting as unmanaged interference sources. In contrast, Fi-Wi's reception diversity (Section 8.1) enhances uplink reliability for all clients, regardless of generation, by combining signals from multiple RRHs.

D.2 Why Trigger Frames Don't Eliminate the Need for Fi-Wi

A natural question: "If 802.11ax APs can use trigger frames for uplink scheduling, why do we need Fi-Wi's centralized architecture?"

Answer: Trigger frames address only a small subset of the problems Fi-Wi solves, and even for uplink scheduling, they provide per-AP control, not building-scale coordination.

What trigger frames do NOT provide:

Centralized queues across APs: Even with trigger frames, each AP maintains its own independent downlink and uplink queues. There's no shared queue state, no unified bottleneck, and no coordinated ECN marking across APs.
Shared state: Trigger-capable APs still operate autonomously. They don't share CSI, retry statistics, airtime usage, or queue metrics. Each AP makes trigger scheduling decisions based only on its local view.
Coordinated L4S marking: There's no mechanism in 802.11ax for multiple APs to coordinate ECN marking or present a single logical bottleneck to L4S. Each AP marks (or doesn't mark) independently.
Dynamic RF grouping: 802.11ax APs don't dynamically reconfigure which radios share airtime resources based on interference, CSI structure, or collapse risk. They're fixed islands.
Tail latency control: Trigger frames help with uplink efficiency, but they don't address the fundamental problem of hidden queues, uncontrolled aggregation, and tail latency blowup under load across a multi-AP building.

D.3 OFDMA Resource Units and Airtime Domains

802.11ax OFDMA subdivides a channel into resource units (RUs). In Fi-Wi, an airtime domain is a logical entity representing a shared RF resource. OFDMA RUs provide finer-grained subdivision of that airtime resource.

Conceptually:

Without OFDMA: An airtime domain on channel 36 has one resource: the full 80 MHz channel. Only one transmission can occur at a time within that domain.
With OFDMA: The same airtime domain can be subdivided into RUs. The concentrator's scheduler can allocate RU1 to STA-A, RU2 to STA-B simultaneously.

This does not change the fact that all RRHs in that airtime domain share a single group queue and marking point. It simply allows the service process to be more efficient.

D.4 BSS Coloring and Spatial Reuse

802.11ax BSS coloring allows STAs to distinguish between intra-BSS frames (same color) and inter-BSS frames (different color), enabling more aggressive spatial reuse.

Relationship to Fi-Wi RF grouping: Fi-Wi's dynamic RF grouping (Section 6) serves a similar but more sophisticated purpose. Fi-Wi uses richer information (CSI, retry statistics, airtime) to decide grouping, not just RSSI thresholds. In a Fi-Wi deployment, the concentrator can assign BSS colors to RRHs strategically: RRHs in the same airtime domain get the same color, while isolated domains get different colors.

D.5 802.11be (Wi-Fi 7) Multi-AP Coordination

802.11be (Wi-Fi 7) introduces multi-AP coordination features that appear to move in Fi-Wi's direction:

Coordinated Spatial Reuse (C-SR): Multiple APs coordinate their transmissions to reduce mutual interference.
Coordinated Beamforming (C-BF): APs share CSI and coordinate beamforming weights to improve SINR.
Joint Transmission (JT): Multiple APs transmit the same data to a client simultaneously, providing diversity or increased SNR.

How these relate to Fi-Wi: These features acknowledge the problem of autonomous APs but approach it incrementally. 802.11be uses distributed AP-to-AP messaging, which limits scale and speed. Fi-Wi centralizes the data plane, enabling deeper coordination than distributed messaging can achieve.

D.6 Deployment Strategy: Mixed Client Populations

A key advantage of Fi-Wi's architecture is that it degrades gracefully with mixed client populations and doesn't require forklift client upgrades.

Client capability tiers in a 2025 deployment:

Legacy 802.11ac and earlier: No trigger frame support, no OFDMA, no BSS coloring.
- Fi-Wi provides: centralized downlink queuing, L4S marking, reception diversity on uplink, beacon shaping to reduce contention.
- Result: Significantly better latency and stability than traditional multi-AP, even without 802.11ax features.
802.11ax with partial features: May support downlink OFDMA, BSS coloring, some power save enhancements, but not uplink OFDMA or uplink MU-MIMO.
- Fi-Wi provides: All of the above, plus downlink MU-OFDMA where beneficial, coordinated BSS coloring across RRH groups.
- Result: Better spatial reuse and efficiency, still robust to clients that don't support full 802.11ax.
802.11ax with full features: Supports uplink OFDMA and uplink MU-MIMO via trigger frames.
- Fi-Wi provides: All of the above, plus trigger-based uplink scheduling, uplink MU-OFDMA for small packets, coordinated uplink/downlink airtime management.
- Result: Bidirectional sub-millisecond latency control, maximum airtime efficiency.
802.11be (Wi-Fi 7): Adds MLO, 320 MHz channels, 4096-QAM, possibly multi-AP coordination support.
- Fi-Wi provides: Can leverage MLO via concentrator coordination (Section 13.3), wider channels for capacity, and potentially integrate with 802.11be multi-AP features while maintaining superior shared-state coordination.
- Result: Cutting-edge performance while maintaining backward compatibility.

Deployment strategy:

Day 1: Fi-Wi delivers core benefits (low latency, L4S, collapse avoidance) with whatever client mix exists, including large populations of 802.11ac and partial-802.11ax devices.
Ongoing: As clients are naturally refreshed, Fi-Wi automatically takes advantage of newer capabilities without requiring RRH hardware changes.

D.7 Summary: 802.11ax/be as Enhancements, Not Replacements

802.11ax and 802.11be introduce valuable features — trigger frames, OFDMA, BSS coloring, multi-AP coordination — that align with Fi-Wi's centralized control philosophy and can enhance Fi-Wi deployments when clients support them. However:

These features do not eliminate the need for Fi-Wi's architecture. They provide per-AP enhancements and limited inter-AP coordination, but they cannot create the unified data plane, shared state, and building-scale control that Fi-Wi provides.
Fi-Wi is designed to work with or without them. Core benefits (centralized queues, L4S marking, tail latency control) are independent of client 802.11ax/be support.
Fi-Wi leverages them when available. As client capabilities improve, Fi-Wi automatically benefits from trigger-based uplink scheduling, OFDMA efficiency, and other enhancements without requiring architectural changes.

In short: 802.11ax/be features make Fi-Wi better, but Fi-Wi solves problems these standards cannot address within the constraints of the distributed-AP model. Fi-Wi is not "better APs" — it's a different architecture that happens to integrate well with modern Wi-Fi standards as they evolve.

Appendix E: ASIC Evolution to Complexity

E.1 Why ASICs accumulate legacy complexity

Unlike software, ASICs cannot easily “refactor away” unused features. Removing blocks typically requires re-verifying entire subsystems, while adding blocks often requires verifying only the new logic. This asymmetry encourages accumulation:

Removal risk > addition risk: removing a feature may impact existing customers in hard-to-observe ways; adding a feature does not threaten prior deployments.
Career incentives favor addition: “I added feature X” is a visible win; “I removed 50K gates” is nearly invisible but carries blame if anything breaks.

Over many product generations, this leads to RTL codebases that only grow. Legacy modulation modes, preambles, power-save FSMs, calibration paths, and debug hooks persist long after their practical value has disappeared.

E.2 Real costs of legacy bloat

This accumulated complexity has tangible costs:

Die area: more logic means a larger die, lower yield per wafer, and higher unit cost.
Power: unused or rarely used blocks still consume leakage and often dynamic power.
Verification: more RTL means more corner cases, more regressions, and longer bring-up.
Timing closure: deep, interdependent datapaths make timing more difficult and fragile.

E.3 How Fi-Wi changes the design equation

Fi-Wi’s architecture separates the system into:

Concentrator: complex, software-defined control plane and data plane intelligence.
RRH: thin, timing-critical RF endpoint with minimal state.

This separation dictates where complexity must live. RRHs implement only what must be fast and deterministic: RF front end, PHY processing, minimal MAC TX/RX, DMA, PTP synchronization, and PCIe-over-fiber transport. All high-level behavior (queueing, L4S policy, aggregation strategy) lives in the concentrator.

E.4 Economic and engineering leverage

For a modern Wi-Fi chip at an advanced node, even a modest reduction in unnecessary logic can translate into significant savings: smaller die, lower power, simpler verification, and faster time to market.

E.5 Design principle for Fi-Wi RRH silicon

The guiding principle for Fi-Wi RRH design is:

Complexity belongs in the concentrator; only latency-critical functions belong in RRH silicon.

Concretely, this means: no autonomous AP queueing/scheduling logic, no legacy PHY/MAC support beyond what Fi-Wi needs, and no embedded firmware CPU managing per-station behavior at the edge.

Appendix F: A Day in the Life of a Packet (The "Preamble Shield" in Action)

To truly understand Fi-Wi, we must follow a single packet through the system at the microsecond scale. This narrative illustrates how the Workstation Concentrator (Section 13) and the Scatter-Gather RRH (Appendix C) collaborate to trick the physics of latency.

F.1 The Scenario

The Setting: Room 304 (Served by RRH-A and RRH-B). The Flow: A 4K Video Frame (Downlink) destined for "Alice's Laptop." The Constraint: L4S requires <1ms tail latency. The Challenge: The packet is currently 200 meters away in the Concentrator's DRAM.

F.2 The Downlink Race (The "Preamble Shield")

T = 0 µs (Arrival): The video packet arrives at the Concentrator's NIC. The CPU timestamps it immediately.

T = 2 µs (The Decision): The Concentrator's software scheduler inspects the packet.

Queue Check: It places the packet in the "Room 304 Airtime" queue.
L4S Check: Current sojourn time is low (50 µs). No ECN mark applied.
RF Check: Recent CSI shows RRH-A has the best vector to Alice.

T = 10 µs (The Setup): The scheduler posts a DMA Descriptor to RRH-A via PCIe.
Note: The payload data (1500 bytes) stays in the Concentrator. Only a 16-byte pointer moves to the edge.

T = 50 µs (The Trigger): RRH-A's LBT logic sees the airtime is clear. It begins the transmission sequence. This is where the magic happens:

The Race Against the PHY:
Action 1: RRH-A starts transmitting the 802.11 Preamble (PLCP) from its local SRAM. This takes 20 µs of airtime.
Action 2: Simultaneously, RRH-A issues a PCIe Read Request to fetch the payload from the Concentrator.

The payload must travel 200m up the fiber and back before the Preamble finishes transmitting.

T = 52 µs (The Fetch): The Read Request hits the Concentrator's PCIe controller. Because of the 92-lane non-blocking fabric (Section 13), there is zero switching delay.

T = 55 µs (The Return): The payload data flies back down the fiber.

T = 58 µs (The Handover): The payload data arrives at RRH-A's FIFO. The PHY is just finishing the last symbol of the Preamble.

T = 59 µs (Seamless Serialization): The PHY seamlessly switches from transmitting the Preamble to transmitting the payload. To the air, it looks like one continuous stream. The 200-meter fiber latency effectively vanished because it was hidden behind the mandatory PHY training sequence.

F.3 The Uplink Journey (Diversity & Sensing)

T = 200 µs: Alice sends a TCP ACK.

T = 204 µs (The Multi-Stat): Both RRH-A and RRH-B hear the ACK.

RRH-A sees a strong signal (-45 dBm).
RRH-B sees a weak reflection (-72 dBm).

T = 210 µs (The Race Up): Both RRHs push the packet + CSI metadata to the Concentrator.

T = 215 µs (The Deduplication): The Concentrator sees two copies of Sequence #104. It discards the weak one from RRH-B but keeps the CSI data to update the "Sensing Model" (detecting that someone is standing near RRH-B, blocking the line of sight).

F.4 Contrast with Legacy Wi-Fi

If this were a traditional AP:

Downlink: The packet would be buffered in a per-TID queue inside the AP hardware. It might wait behind a large aggregate for another user (Head-of-Line blocking). ECN marking would be speculative (based on enqueue time), likely missing actual congestion events. The "Preamble Shield" trick is impossible because the AP CPU can't fetch data reactively—it must buffer it proactively.
Uplink: Only the associated AP would hear the ACK. If Alice moved slightly and blocked the signal, the packet might be lost, requiring a MAC-level retry (adding 2-5ms latency). Fi-Wi's diversity combining prevents this retry.

F.5 Edge Cases and Advanced Scenarios

RRH Failure: If RRH-A fails during the prefetch (e.g., power loss), the concentrator detects the link loss immediately via PCIe link state. Because the packet payload never left Concentrator DRAM, the scheduler simply re-posts the descriptor to RRH-B. No packet is lost, and TCP does not see a drop.

Congestion: The scatter-gather pipeline depth allows the Concentrator to queue up the next descriptor while the current one is transmitting. This allows back-to-back TXOPs (SIFS spacing) without idle gaps on the air, even with the fiber latency.

Coordinated Transmission: The Concentrator can schedule RRH-A and RRH-B to transmit concurrently to spatially separated clients. It analyzes the CSI matrix to determine if spatial isolation is sufficient (>25 dB cross-coupling attenuation). If yes, both RRHs transmit simultaneously using standard 802.11 frames. If interference is detected, the Concentrator schedules sequential TXOPs. This dynamic decision happens per-packet based on real-time CSI.

F.6 Summary: The Packet's Perspective

From the packet's view, Fi-Wi provides uplink diversity, per-flow fair queuing, accurate ECN marking, and speculative DMA that hides PCIe latency. The packet experiences the network as a transparent, zero-wait pipe.

F.7 The Critical Insight: Timing vs. Intelligence

Fi-Wi separates timing (RRH hardware) from intelligence (Concentrator software), bridged by the speculative DMA prefetch pipeline. This allows the hardware to meet strict microsecond deadlines while the software retains the flexibility to run complex scheduling, L4S, and spatial multiplexing logic.

Appendix G: The Strategic Case for Fiber Infrastructure

The upfront cost of installing fiber is often the primary friction point for C-RAN adoption ("The Fiber Tax"). However, this framing ignores the physics of modern signaling and the macroeconomics of construction. Fi-Wi's reliance on fiber is not a tax; it is a strategic asset conversion.

G.1 The Physics of 100G (The Copper Wall)

We are hitting a hard physical limit with copper cabling. At modern data center speeds (100Gb/s), signal loss in copper is so high it is characterized in dB per inch.

High-Frequency Attenuation: 100Gb/s signaling requires Nyquist frequencies of 25-50 GHz. At these frequencies, skin effect and dielectric loss in twisted-pair copper make long-distance transmission physically impossible.
The Reach Limit: In the data center, 100G copper DAC cables are limited to 1-3 meters. To run 100G+ speeds through a building's walls (50-100m), copper is effectively dead.
Future-Proofing: Fiber defies this gravity. It carries 100G, 400G, or 800G over hundreds of meters. Installing Cat6A today guarantees a "rip-and-replace" expense tomorrow.

G.2 Labor Rate Hedging (Inflation Proofing)

In low-voltage construction, the cost of cabling is dominated by labor (often 70-80%), not material.

The Inflation Trap: Construction labor rates rise over time. Relying on copper commits the owner to re-cabling cycles at future, higher rates.
The Fiber Hedge: Installing fiber today locks in today's labor rates for a permanent infrastructure asset.

G.3 Asset vs. Consumable

Unlike HDMI or Copper Ethernet—which are purpose-built cables engineered for a single generation—fiber is a raw transport medium. It is a "pipe for light" that supports Ethernet, DWDM, and PCIe-over-Fiber simultaneously.

While cable standards have cycled (Cat5e → Cat6 → Cat6A), they remain tethered to the legacy RJ45 connector. This physical interface is rapidly becoming obsolete. Fi-Wi recognizes that the connection is what matters, not the physical port. In this architecture, the 802.11 wireless interface becomes the new connector. By installing fiber once as a permanent asset and treating Wi-Fi as the universal 'plug' inside the room, the building infrastructure is 'one and done'. This finally breaks the cycle of physical obsolescence.

Appendix H: Centralized Observability and the ML Advantage

Fi-Wi's centralized architecture provides observability that is difficult or impractical to achieve in distributed AP systems. This appendix presents the Observability Matrix—a systematic comparison of what telemetry is directly observable, partially observable, or hidden across different measurement approaches. This complete visibility is the prerequisite for effective machine learning (Section 15) and deterministic L4S control.

The Observability Gap

Traditional Wi-Fi deployments rely on tools that provide only partial visibility into system state. Operators attempt to infer problems from symptoms (latency spikes, ECN marks, throughput degradation) without directly observing root causes (queue growth, retry timing, MCS selection under interference). This inference distance—the number of steps between observable effects and hidden causes—makes control systems less stable and limits the effectiveness of machine learning.

The table below compares observability across six measurement approaches. The legend indicates:

Direct: Directly measurable with microsecond-resolution timestamps

Partial: Partially observable or requires inference

Not Observable: Hidden or cannot be reliably measured

Observability Matrix

Telemetry / Metric	ESP32-C5 RF sensor	RPi 5 Monitor mode	RPi 5 L4S node	tcpdump Packet capture	iperf2 L4S	Fi-Wi Concentrator
Energy detect / CCA
Channel busy time
NAV / medium reservation
CSI / channel matrix
MCS / GI / NSS
PER / retry counts
RSSI / SNR
Queue depth
Sojourn time
ECN marks
One-way delay (OWD)
Responsiveness
Throughput / goodput
Deterministic playback

Critical Observations

Queue Depth and Sojourn Time (highlighted rows):

These metrics are essential for L4S congestion control and machine learning. Traditional tools (tcpdump, Wi-Fi packet capture) cannot directly observe queue state because it exists inside firmware or kernel layers. While synchronized ingress and egress packet captures could theoretically infer queue depth through timing correlation, this approach requires nanosecond-precise time synchronization across physically separated capture points, perfect packet correlation despite potential losses, and still cannot observe firmware-internal retry queues, aggregation buffer states, or PHY scheduling decisions. External sniffers see the explosion (the packet hitting the air), but they cannot see the fuse burning (the packet sitting in the driver queue). Only centralized queueing architectures expose these values with direct microsecond-resolution timestamps.

MCS / GI / NSS (PHY Configuration):

Monitor-mode packet capture can partially infer MCS from radiotap headers, but this only shows what was transmitted—not the decision process, CSI data, or PER history that informed the choice. The Fi-Wi Concentrator has direct access to the complete decision state.

Deterministic Playback (bottom row):

This capability enables machine learning. Deterministic playback means the Concentrator can reproduce its own decision sequence from a log file: packet arrivals, queue transitions, scheduling decisions, MCS selections, and RRH transmission commands. While actual RF outcomes depend on station behavior and channel conditions that may vary, the Concentrator can replay its control decisions under the logged RF environment to evaluate alternative strategies offline and verify whether different MCS/scheduling choices would have improved performance. This is only possible when all Concentrator-controlled components operate under a single clock with complete state visibility. Distributed systems cannot reconstruct this causal chain from partial packet traces because they lack visibility into queue state, retry logic, and the decision-making process itself.

Why This Enables More Effective Machine Learning

Section 15 describes how Fi-Wi uses machine learning to optimize MCS transition rates. The observability matrix demonstrates significant practical advantages that Fi-Wi's centralized architecture provides for ML training:

Direct Queue Observability: Can label training examples based on actual queue impact and sojourn time measurements
Global CSI Visibility: Can learn cross-RRH interference patterns from centralized measurements
Deterministic Replay Capability: Can evaluate whether alternative decisions would improve outcomes under the same conditions
Low Inference Distance: 1-2 steps between PHY decisions and observable transport-layer effects versus 5-10 steps in distributed systems

Fi-Wi's centralized architecture provides these observability advantages. The Concentrator's event log becomes a high-quality training dataset where every state transition is labeled with measured outcomes under consistent instrumentation. While autonomous AP systems could attempt ML-based rate adaptation using the partial observability available to them, Fi-Wi's richer telemetry—particularly queue visibility, global CSI, and deterministic replay—enables significantly more effective learning and optimization.

Coordination Shares Outcomes; Fi-Wi Centralizes Causes

Coordinated AP systems can share summaries (throughput, ECN marks, interference reports) but cannot share hidden internal state (queue depth, firmware retry logic, aggregation decisions). This creates inference distance—the controller sees effects but not causes. Fi-Wi eliminates inference distance by removing autonomous decision-making from the edge. Queues, scheduling, and PHY selection are centralized under a single clock, producing an observable state graph where causes are explicit, replayable, and directly controllable. This architectural difference translates to measurably better ML training data quality.

Appendix I: Channel Width Orchestration and Service Time Variance

The Fi-Wi architecture treats channel width as a dynamic control parameter managed by the Concentrator. While 802.11be (Wi-Fi 7) emphasizes 320 MHz peak PHY rates, Fi-Wi's orchestration engine strategically selects 40 MHz channel widths in high-density environments to ensure Service Time Stationarity and the stability of the L4S control loop.

I.1 The Contention-Domain Collapse of Wideband Channels

In shared-spectrum MDUs (Multi-Dwelling Units), the theoretical gain of wider channels is often negated by contention-domain collapse. In a CSMA/CA environment, a transmission opportunity (TXOP) requires the entire bonded channel to be idle. In a 6-AP overlapping scenario with 50% aggregate airtime occupancy, the probability of finding all sub-bands simultaneously idle drops exponentially with bandwidth.

Under a simplified independent-sub-band occupancy assumption, a basic model suggests P(160 MHz idle) ≈ (P(40 MHz idle))^4, resulting in 4–16× fewer transmission opportunities. In practice, partial correlation between sub-bands moderates the exponent but does not eliminate the super-linear decline in idle probability. This leads to:

Effective Airtime Utilization: 160 MHz utilization often falls below 10% due to fragmented TXOPs and CCA (Clear Channel Assessment) busy deferrals.
NAV Extensions: A significantly higher probability of virtual carrier-sense freezes from neighboring uncoordinated cells occupying any portion of the wideband mask.

I.2 Queueing Theory and L4S Stability

From an M/G/1 queueing perspective, the performance of the L4S control loop depends on the stability of the service rate (μ). L4S stability requires frequent service opportunities and low variance in service time to prevent the decoupling of the sender's congestion window from the actual queue state.

Service Time Variance (σ²): Wide channels in dense environments increase the variance of service time due to heavy-tailed CCA deferrals.
Feedback Coherence: When a 160 MHz channel freezes for 100ms, it represents 2.5–20 RTTs of stale feedback for typical broadband RTTs (5–40 ms). Typical PI² parameterizations (Section 5.1) assume feedback coherence within a small number of RTTs; exceeding this threshold leads to oscillatory behavior.
The 40 MHz Advantage: By providing a stationary service rate, 40 MHz ensures continuous ECN marking and stable RTT sampling, allowing the L4S loop to remain synchronized with the physical medium.

I.3 Link Adaptation and Spectral Robustness

Narrower channels reduce the probability that partial-band interference (e.g., unmanaged IoT bursts) forces a full MCS downgrade across the entire bonded width. This allows the Concentrator to maintain stable link adaptation and a predictable drain rate, avoiding the chaotic rate-shifting common in 160 MHz deployments.

I.4 Orchestration: Width as a Control Variable

Fi-Wi is not anti-wideband; channel width is an orchestrated variable. The system expands width opportunistically when contention is low to leverage PHY gains and contracts it to 40 MHz when deterministic latency is required. This prioritizes spatial reuse and airtime isolation over maximum burst rate—the fundamental technical unlock for Fi-Wi’s cell-per-room model.

I.5 Capacity Density: Throughput Under a Latency SLO

Fi-Wi optimizes Capacity Density under a Latency SLO, rather than peak PHY on a single link. In dense OBSS environments, wide channels reduce spatial reuse; narrower channels increase the number of bounded contention domains. Consequently, aggregate goodput per area increases even if per-link PHY decreases.

Metric Definition: Low-Latency Goodput Density (ρ_LL)

ρ_LL [Mbps / 1,000 sq ft] = (Σ Goodput_i) / Area | subject to p95 OWD ≤ 20ms

Where Goodput_i is the application-layer payload throughput delivered while maintaining the p95 one-way delay (OWD) constraint. The 20ms threshold reflects the target for interactive L4S applications.

Example Calculation (1,000 sq ft section of a 10,000 sq ft floor):

Assumptions: 50% aggregate offered load per BSS, default EDCA parameters, and no explicit inter-AP coordination in the autonomous case.

160 MHz Autonomous: 1 AP (Wide coverage) × 1.2 Gbps peak × 10% airtime efficiency ≈ 120 Mbps aggregate. Under heavy-tailed service intervals, a large fraction of packets violate the 20ms p95 constraint. When filtered by the latency SLO (ρ_LL definition), usable goodput is reduced to ~12 Mbps in this model.
40 MHz Fi-Wi: 8 RRHs (Cell-per-room) × 400 Mbps peak × 40% airtime efficiency ≈ 1,280 Mbps aggregate. Bounded service intervals allow nearly all traffic to meet the latency SLO, resulting in ~128 Mbps.

I.6 Application: Aligning Wireless Capacity to Gigabit WAN Service

To align with a Gigabit-class WAN service, the wireless architecture must match the aggregate wireline supply to orchestrated spatial demand. In a dense MDU, Contention Delay is 10–100× larger than serialization time. A single 160 MHz AP attempting to serve a Gigabit load creates a "fast but flaky" link that collapses under co-channel interference, delivering only a fraction of the ISP's provided capacity to real-time applications.

Fi-Wi resolves this by using 40 MHz orchestration to spread the Gigabit load across N coordinated spatial domains. This ensures that the building-wide wireless fabric can actually saturate a 1 Gbps WAN link with deterministic, multi-user goodput, rather than relying on single-device peak bursts that starve other users and destabilize shared airtime.

I.7 Aggregation Quantization and L4S Feedback Mismatch

L4S signals congestion at Layer 3 (IP ECN), but wideband Wi-Fi operates via massive Layer 2 A-MPDU aggregation to maintain PHY efficiency. This creates a fundamental control-loop mismatch:

Feedback Sparsity: Massive 160 MHz aggregates transmit dozens of packets in a single burst. This results in "binary" feedback—either a massive burst of ECN marks or none at all—which prevents the PI² controller from calculating the smooth probability signal required for precise rate pacing.
The Sawtooth Latency Trap: Wide channels induce a "fast but infrequent" cadence; the AP must buffer data longer to build efficient aggregates. This creates multi-millisecond stalls that look like random network noise to L4S senders, causing congestion window oscillation.
Timing Incoherence: Traditional APs lose microsecond-level visibility once packets enter hardware aggregation pipelines. Fi-Wi's 40 MHz orchestration enforces smaller, frequent TXOPs (~250 µs), ensuring the MAC service frequency is significantly higher than the L4S control frequency, enabling accurate sojourn-time measurement and coherent ECN marking at the IP layer.

The Fi-Wi architecture addresses these challenges through its DualQ implementation (Section 5.2), which maintains separate queues for L4S and Classic traffic and performs per-packet sojourn time measurements at the Concentrator before entering the A-MPDU aggregation pipeline.

Comparison of Service Metrics (Dense MDU Contention Model)

Scenario: 2x2 MIMO, 6+ overlapping BSSIDs, shared unlicensed spectrum (5/6 GHz), 50% aggregate offered load, autonomous EDCA parameters. See Appendix J for full simulation parameters.

Metric	160 MHz (Autonomous CSMA)	40 MHz (Fi-Wi Orchestrated)
Peak PHY Rate (2x2, MCS 11)	~1.2 Gbps	~300-400 Mbps
Effective Airtime Utilization	<10% (Fragmented TXOPs)	30–50% (Planned reuse / Bounded domain)
Service Time Variance (σ²)	High (Heavy-tailed)	Low (Near-stationary)
Queue Service Interval (median)	Tens to >100 ms	5–15 ms (Stationary)
DualQ ECN Feedback Coherence	Sparse / Burst-marked	Continuous / Stable marking
Goodput Density (ρ_LL) (Mbps per 1,000 sq ft)	~12 Mbps (Overlapping contention domains)	~128 Mbps (8 RRHs, orthogonal 40 MHz channels)

Economic Conclusion: Under realistic dense MDU conditions, Fi-Wi's orchestrated 40 MHz architecture delivers ~10× higher usable goodput density compared to autonomous wide-channel deployments. This is the fundamental advantage of Fi-Wi: capacity scales with RRH density and spatial reuse, not channel width alone.

See Appendix J for detailed contention modeling and simulation methodology.

Appendix J: 10-Node MDU Simulation Methodology

This appendix details the Monte Carlo simulation and analytical models used to derive the Low-Latency Goodput Density (ρ_LL) metrics. The framework evaluates Fi-Wi's spatial capacity gains under realistic Multi-Dwelling Unit (MDU) contention scenarios.

J.1 Spatial and RF Environment Model

The simulation contrasts traditional wide-area coverage with Fi-Wi's localized orchestration.

Building & RF Assumptions:

Geometry: 10,000 sq ft floor divided into 8 units (~1,250 sq ft each). Metrics are normalized to "per 1,000 sq ft" for comparative analysis.
Path Loss Model: PL(d) = PL(d₀) + 10n log₁₀(d/d₀) + Xσ with n = 2.8.
OBSS Overlap: Autonomous case assumes 6 neighboring BSSIDs audible at ≥ -62 dBm.
Fi-Wi Isolation: 8 RRHs achieving >25 dB co-channel isolation through planned orthogonal reuse.

J.2 Contention and Backoff Logic

The simulation models 20 active stations (STAs) distributed across the 8-unit floor (average 2.5 STAs per unit). Service Time Variance (σ²) is calculated by observing the delay between TX_START and ACK_END across 10⁶ simulated TXOPs.

Autonomous Case (160 MHz): All 20 STAs compete for the shared 160 MHz channel. Probability of transmission is modeled as: P(TX) = [1 - p_occ]^4, where p_occ is aggregate occupancy from overlapping neighbors.
Fi-Wi Case (40 MHz): STAs are partitioned into 8 isolated airtime domains. P(TX) = 1 - p_local_occ, restricted to immediate room-level neighbors.

J.3 The ρ_LL Filtration Process

The Goodput Density is derived by filtering raw throughput through the 20ms p95 OWD constraint.

// Derivation for ρ_LL Calculation
for each packet i:
    delay_i = contention_delay + serialization_delay + retry_overhead
    if delay_i <= 20ms:
        accepted_payload += size_i
    else:
        dropped_from_goodput_metric++

ρ_LL = (accepted_payload) / (total_time * area)

J.3.1 Numerical Results and Derivation

The simulation produces the following goodput derivation for a 1,000 sq ft sections:

160 MHz Autonomous: 120 Mbps raw aggregate throughput. However, heavy-tail contention spikes cause 90% of packets to exceed the 20ms SLO. 120 Mbps * 0.10 = 12 Mbps ρ_LL.
40 MHz Fi-Wi: 1,280 Mbps raw aggregate throughput (across 8 orthogonal domains). Stationary service rate allows 99.8% of packets to meet the 20ms SLO. 1,280 Mbps * 0.998 = ~128 Mbps ρ_LL.

J.4 Traffic Model and Payload Composition

Traffic Type	% of Load	Constraint
Interactive (L4S/Gaming)	20%	Strict SLO subject
Streaming (4K Video)	50%	Freeze sensitive
Bulk (Background)	30%	Throughput focused

↑ Contents

0. Technical Disclaimer

0.1 L4S Foundation and References

Core L4S Specifications

Transport & Production Status

Further Reading

1. Motivation and Problem Statement

Why Traditional Wi-Fi Cannot Support L4S

Why Copper Infrastructure Has Reached Its Limits

How Fi-Wi Breaks Both Cycles

The Opportunity Is Here

2. The Wi-Fi Crisis: Why Evolution Failed and Control Was Lost

2.1 The Evolutionary Trap: Why Incremental Improvements Failed

2.2 The Density Paradox: More Capacity, Less Performance

2.3 The Three Technical Failure Modes

2.3.1 Protocol Tax: The Hidden Node Penalty

2.3.2 The MCS Matrix: Un-Engineerable Complexity

2.3.3 The Spatial Contention Cascade

2.4 The Operator's Dilemma

2.5 Why Conventional Solutions Don't Scale

2.6 The Client Side: L4S and the End of Uplink Contention

Technical Insight: The "Driver Queue" Trap

2.7 The Strategic Reset: Splitting the Graph

Technical Insight: The QoS Fallacy

2.8 Interactive Visualization: The MCS Collapse Under Load

How to Use the Simulation

Technical Details: Understanding the Visualization

Why This Matters for Network Operators

3. System Picture

System Diagram: Fi-Wi Concentrator, Central Packet Memory, and Multiple RRHs

3.1 Classical Stack vs. Fi-Wi (The C-RAN Shift)

3.2 Dual-Loop Control Model

4. Key Fi-Wi Mechanisms

4.1 Time Synchronization

4.1.1 The Fronthaul Clock: PTP/802.1AS

4.1.2 The 802.11 TSF Domain

4.1.3 The Concentrator as Time Origin

Figure 4.1-1: The Concentrator as Time Origin

4.1.4 Time-Driven EDCA Orchestration

4.2 Fi-Wi Shim Header

4.3 AQM / L4S Marking Placement

4.4 Centralized Packet Memory and DMA

4.5 RRH Edge Control via Beacon Power Shaping

4.6 Fronthaul Requirements and Feasibility

4.6.1 Why PCIe Over Fiber?

4.6.2 PCIe Bandwidth Requirements

4.6.3 PCIe Link Configuration

4.6.4 Concentrator PCIe Topology

4.6.5 PCIe Over Fiber: Physical Layer

4.6.6 Latency Analysis

4.6.7 Jitter and Determinism

4.6.8 Distance Limitations

4.6.9 Cost Analysis

4.6.10 Alternative: Hybrid PCIe + Ethernet

4.6.11 Comparison to Cellular Fronthaul Standards

4.6.12 Summary: PCIe Over Fiber Enables Fi-Wi Architecture

4.7 Precision Clock Synchronization over Fronthaul

4.7.1 The Concentrator as Grandmaster (GM)

Diagram 4-2: The Fi-Wi Clock Tree Topology

4.7.2 What Clock Synchronization Actually Enables

4.7.3 Operating Modes: GPS-Disciplined vs. Free-Wheeling

Mode A: GPS-Disciplined (Absolute Synchronization)

Mode B: Free-Wheeling (Relative Synchronization)

4.7.4 When Absolute Time Becomes Mandatory

4.7.5 RRH Clock Distribution Hardware

Diagram 4-3: RRH Precision Clock Distribution Chain

4.7.6 Why Mobile Wi-Fi SKUs?

4.7.7 What Clock Synchronization Does NOT Enable

5. Control Architecture: The Dual-Integrator System

5.1 The Two Integrators

5.2 The Outer Loop: L4S and Group Queue Dynamics

5.2.1 Queue Dynamics

5.2.2 The PI² Control Law

5.3 The Inner Loop: MAC Aggregation and TXOPs

5.4 System Integration: Time-Scale Separation

5.4.1 Frequency Domain Constraint

5.4.2 A-MPDU Aggregation Coherence and ECN Marking Precision

The Aggregation-Feedback Mismatch

Fi-Wi's Coherence Strategy

5.4.3 Design Parameters for Stability

6. Airtime Domains and Dynamic Queue Grouping