Umber Networks Proprietary Architecture

Umber Fi-Wi Architecture: Cellularized Wi-Fi, L4S, and RF Coordination

Timestamp-synchronized control loops, dynamic RF grouping, and multi-RRH operation
Umber Networks Fi-Wi Technical Architecture Overview (Version 1.1, December 2025)

Zebras look like horses, but they are not the same... Zebras, despite man's best efforts, cannot be tamed. The Wi-Fi we have engineered today remains fundamentally a collection of autonomous, uncoordinated things—zebras that simply cannot be harnessed.

Fi-Wi is architected from the ground up to be controllable, coordinated, and directed — the horse we need for in-building communications and sensing. As latency demands tighten and building densities increase, Fi-Wi isn't just a better future; it's the future we can build today.

0. Technical Disclaimer

The material presented in this document describes the Fi-Wi architecture and associated engineering concepts. It is provided "as is" for discussion and exploratory design purposes only. Nothing in this document constitutes a formal specification, performance guarantee, regulatory assertion, or commitment to implement any feature described.

Several sections use simplified or idealized assumptions to illustrate architectural differences between Wi-Fi, Multi-Link Operation (MLO), Low Latency Low Loss Scalable throughput (L4S), and Fi-Wi queueing and scheduling behavior. These examples are intended to clarify concepts rather than fully model the non-linear and stochastic dynamics present in operational wireless systems.

Real system behavior depends on hardware characteristics, RF topology, firmware behavior, congestion patterns, environmental conditions, and interactions with legacy Wi-Fi devices. Actual performance may differ from the representative models and examples described here.

Important Note on Capabilities: This document describes an architecture using Commercial Off-The-Shelf (COTS) Wi-Fi chipsets. The system provides dynamic point selection, intelligent frequency reuse, and centralized MAC scheduling. It does not provide RF phase control, distributed MIMO, or coordinated simultaneous transmission—capabilities that would require custom ASIC development. All described features are achievable with commodity Wi-Fi hardware and comply with unlicensed spectrum regulations.

0.1 L4S Foundation and References

Low Latency, Low Loss, Scalable Throughput (L4S) is a suite of IETF standards that extend the Internet's congestion control mechanisms through Explicit Congestion Notification (ECN) to support very low queuing delays. L4S is a ratified protocol stack with multiple production implementations.

Fi-Wi is architected specifically to provide the deterministic underlying transport required to satisfy the strict queuing mandates defined in these standards.

Core L4S Specifications

Transport & Production Status

L4S replaces capacity-seeking behavior (Reno/Cubic) with pacing-based rate control. It is currently deployed in production environments including:

Further Reading



1. Motivation and Problem Statement

"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." — Antoine de Saint-Exupéry

"Everything should be made as simple as possible, but not simpler." — Albert Einstein

With 23.3 billion Wi-Fi devices in use worldwide and 5.5 billion people depending on internet connectivity, and growing, Wi-Fi has become the primary way we access the internet. So much so many people think Wi-Fi is the internet. It's how a home healthcare worker video-calls to check on a patient, or a cancer patient connects to their support group. It’s how a parent works remotely while their child attends school online, and how lifelong learners access the information they need to grow. It’s how a grandmother monitors her heart condition through a telehealth app. It’s how a family member finds their next job, or how a neighbor orders a meal.

Running quietly in the background are autonomous systems we've come to depend on: security cameras that alert us to threats, medical monitors that track vital signs, smart home systems that manage climate and safety, IoT sensors that detect water leaks or carbon monoxide. These systems don't wait for us to notice problems—they operate continuously, silently, keeping people safe.

We've moved far beyond entertainment and convenience. Wi-Fi now carries the infrastructure of daily survival. When it breaks down under density or congestion, it's not just buffering that fails. It's jobs, healthcare access, human connection, and the life-safety systems we trust to work when we're not watching. The $4.9 trillion Wi-Fi contributes to the global economy isn't an abstract number. It's the cumulative value of billions of human activities and critical systems that simply stop working when the network fails.

Why Traditional Wi-Fi Cannot Support L4S

The infrastructure supporting all of this is failing at scale, and it must be addressed for all. The industry is moving toward L4S and ECN-based control to eliminate bufferbloat, but traditional Wi-Fi makes this impossible. Legacy congestion-control loops fail by design once a single flow saturates the bottleneck queue, and even modern ECN-based systems such as L4S cannot converge when Wi-Fi hides queue depth, induces collision storms, injects firmware-created delays that look like queues, and constantly shifts transmission (PHY) rates through its rate-control and aggregation machinery. Mesh networks and more APs catalyze intolerable user experiences by injecting more uncoordinated radios into an already chaotic RF environment. And because the AP industry understands these limits, it is no surprise that even major vendors publicly state that L4S cannot operate correctly over the products they sell.

Adding more Ethernet-attached APs makes it worse by creating more overlapping contention domains. Hidden queues in SoCs, rate-control firmware, and aggregation pipelines obscure the true bottleneck. In control-theory terms: the bottleneck queue cannot expose its state, the PHY rate is not stationary, and the closed loop cannot stabilize. This is why user experience fails in many apartments and homes, in hotels, MDUs, stadiums, and high-density buildings long before “capacity” is reached.

QoS cannot rescue this architecture. Because the bottleneck queue inside a Wi-Fi AP has no information about actual flow urgency or priority, no QoS mechanism can operate meaningfully. The only real solution is to avoid congestion altogether — which is exactly what L4S researchers have designed for and exactly what Fi-Wi supports.

Why Copper Infrastructure Has Reached Its Limits

While the protocol fails in the air, the physical infrastructure fails in the walls - the industry’s traditional answer of running copper Ethernet to APs — simply extends the lifetime of an architecture that has reached its limits. Copper requires periodic rip-and-replace cycles: Cat5 becomes Cat6, then Cat7, then Cat8. A home builder has no idea what communications wiring to install. The RJ45 connector and its plastic tab is fragile, outdated and end of life. And at 25G, 40G, or 100G, physics takes over: copper loses signal in dB per inch. Data centers have abandoned structured cabling (long-run copper) for core transport, restricting copper only to short-reach intra-rack DACs. Fi-Wi applies this same logic to the building: Fiber for the long haul (halls/walls), radio for the short hop.

How Fi-Wi Breaks Both Cycles

Fi-Wi breaks the cycle. Install fiber once — and never revisit behind walls or ceilings again. The glass is permanent; only the optics evolve. Fiber is already the universal medium for 100G/400G data centers, DWDM long-haul transport, and now PCIe throughout a building with Fi-Wi. Remote Radio Heads simply convert between fiber and 802.11, eliminating embedded routing, rate-control SoCs, switching silicon, and the security-patch treadmill they require. When Wi-Fi standards evolve, you replace the small radio module(s) — that's all.

What is C-RAN?
Fi-Wi adapts the Centralized/Cloud Radio Access Network (C-RAN) architecture from 4G/5G cellular systems. In C-RAN, intelligence (baseband processing) is centralized while radio heads are distributed. Fi-Wi applies this proven approach to Wi-Fi, enabling building-scale coordination impossible with autonomous access points.

Fi-Wi turns fiber combined with 802.11 into the permanent, predictable, control-theory-friendly transport that the L4S control loop requires, and treats 802.11 radio heads as the small, disposable, last-meters, connector-free interface where the in-building network behaves deterministically. And because fiber increases the long-term value of a building, the investment is not just technically durable — it is financially durable.

The Opportunity Is Here

There is no law of physics that says Wi-Fi cannot work at scale. The collapse we're seeing in apartments, hotels, and high-density buildings isn't inevitable. The researchers have shown engineers how to proceed. We know how to build stable control loops. We know how to coordinate radios. We know how to deploy permanent infrastructure.

The conditions for solving this are here, now. Engineering talent exists across our industry. The market has already validated the foundation: China's FTTR deployments have installed fiber to millions of rooms, proving that permanent infrastructure at this scale is not just feasible—it's already happening at volume. What's missing is capital directed at the right architecture. Investors are essential to this challenge. Their capital will enable the engineering to serve the market. And, once proven, market signals will sustain the development, directing human resources toward building what humanity needs for continued advancement.

Fi-Wi is Umber's answer, but the underlying challenge belongs to all of us. The 5.5 billion people depending on this infrastructure deserve better than a system designed for convenience that we've repurposed for survival. This is solvable engineering—the talent is ready, the manufacturing exists, and the market is waiting. It's time we came together and fixed this.

About Umber Networks
Umber Networks was founded by Bob McMahon, a networking engineer with 35 years of experience building internet infrastructure. Bob created and maintains Iperf2, the industry-standard network performance measurement tool with over 3 million downloads worldwide. His career spans foundational work on FDDI for the International Space Station (1989), development of the Cisco Catalyst RSM routing module deployed worldwide, and wireless chipset testing using statistical process controls at Broadcom. Fi-Wi represents the culmination of decades solving congestion control, wireless scaling, and real-time transport challenges at the protocol and silicon level.

2. The Wi-Fi Crisis: Why Evolution Failed and Control Was Lost

The failure of modern Wi-Fi to support low-latency applications (L4S) is not a failure of bandwidth; it is a failure of control. With 23.5 billion Wi-Fi devices deployed globally, the protocol has hit an asymptotic limit where adding complexity yields diminishing returns.

As density rises, autonomous contention scales super-linearly—effectively operating as the inverse of Metcalfe's Law. The result is a rising noise floor and media access collisions that render unlicensed spectrum unusable for the deterministic performance required by next-generation applications.

2.1 The Evolutionary Trap: Why Incremental Improvements Failed

Evolutionary engineering is powerful; it gave us twenty-five years of Wi-Fi speed improvements. But every evolutionary curve eventually hits an asymptote—a point where adding more complexity yields diminishing returns. We have reached that point.

"The IEEE 802.11 working group behaves like a composer writing a symphony that effectively cannot be played. They continually add instruments—4096-QAM, Puncturing, MLO—without considering that the musician (the silicon) has only microseconds to react."

The decision matrix for a Wi-Fi chip has exploded combinatorially. We can trace this through the Modulation and Coding Scheme (MCS) Table:

The Physical Trap: When the firmware engineer fails to optimize the radio, can we simply redesign the chip? No, because of RTL (Register Transfer Level) Accretion. In software, engineers "refactor" unwieldy code. In hardware, refactoring is economically forbidden. A complex SoC takes 18–24 months to validate; removing "dead" logic risks breaking obscure corner cases. Consequently, vendors only add; they never subtract. 802.11be logic wraps around 802.11ax logic, which wraps around 802.11ac logic—twenty-five years of accumulated technical debt consuming area and leakage power.

The Market Signal: The ultimate proof that the standard has reached gridlock is the behavior of market leaders like Samsung and Apple. They no longer rush to support every new feature—they aggressively whitelist features and blacklist others because complexity drains battery and destabilizes connections. When the two largest consumers of wireless silicon effectively stop buying the complexity argument, the evolutionary roadmap is broken.

2.2 The Density Paradox: More Capacity, Less Performance

The fundamental instability of 802.11 stems from the Birthday Paradox applied to media access. In an autonomous system, as the number of contending stations (n) increases linearly, the probability of collision increases combinatorially:

Collision Pairs = n(n-1)/2
For n=100 devices: 4,950 potential collision pairs
At P(collision) = 1/N per pair, aggregate P(failure) → 1 as n → ∞

Simulation data confirms that even with moderate client density, collision probability quickly exceeds 50%, forcing the network into a state of "Drift" where latency becomes unbounded. Under these conditions, the network is no longer constrained by PHY capacity, but by the probability of successful media access.

This is Metcalfe's Law in reverse: instead of each new node increasing the value of the network, each new node increases the chance of interference and reduces usable capacity.

2.3 The Three Technical Failure Modes

The collapse of the operator model is driven by three distinct architectural failures inherent to the 802.11 standard.

2.3.1 Protocol Tax: The Hidden Node Penalty

Standard Wi-Fi relies on Carrier Sense Multiple Access (CSMA), which assumes that all stations can hear each other. In real-world MDU (Multi-Dwelling Unit) environments, this assumption fails catastrophically.

Field measurements using ESP32-based sensors reveal that hidden node contention consumes 30-50% of available airtime in typical MDU deployments—airtime paid for in spectrum acquisition costs but lost to protocol overhead invisible to traditional monitoring. This represents a massive protocol tax where significant airtime is consumed by retries and backoff slots rather than payload delivery.

2.3.2 The MCS Matrix: Un-Engineerable Complexity

The most critical failure for a network operator is the loss of state control. Modern 802.11ax supports 12 MCS indices × 4 bandwidth options × 8 spatial stream configurations × 3 guard intervals = >1,000 valid PHY states. Autonomous rate selection must navigate this space at sub-millisecond timescales under non-stationary noise.

This creates a Non-Stationary System:

Because Wi-Fi is non-stationary, autonomous rate selection under contention has no bounded outcome. The IEEE 802.11 standard has allowed the MCS table to explode into hundreds of valid permutations—a chaotic state space that firmware must navigate in microseconds with incomplete information.

2.3.3 The Spatial Contention Cascade

As load increases, the spatial precision of the network degrades. Mathematical modeling shows that the condition number (κ)—a measure of how well-conditioned the MIMO channel matrix is—degrades from 6 dB (excellent spatial separation) to >12 dB (severe interference) under load. This collapse means that 4×4 MIMO effectively degrades to 2×2 or worse, turning additional spatial streams into self-interference rather than capacity.

This degradation collapses the theoretical gains of Mu-MIMO, transforming high-order spatial streams into interference rather than usable capacity. The "Efficiency Paradox" emerges: Wi-Fi evolution has focused on shrinking Payload Duration (faster PHY rates like 4096-QAM) while MAC Overhead (LBT, Backoff, Preamble) remains constant. To amortize the overhead, chips must build massive Aggregates (A-MPDUs). This destroys latency. We have engineered a Ferrari engine (the PHY) inside a garbage truck (the MAC).

2.4 The Operator's Dilemma

For network operators—whether cable MSOs, telcos, or fiber providers—this architectural chaos presents a fundamental business risk: You own the customer experience, but not the air interface.

2.5 Why Conventional Solutions Don't Scale

Traditional attempts to solve Wi-Fi density problems fail because they address symptoms rather than the underlying architectural failure:

The Trillion-Dollar Context: The mobile industry spent $600 billion building 5G to get scheduled, deterministic performance outdoors. They understand that unlicensed spectrum + autonomous contention = chaos. The genius of 5G is its architecture; its Achilles heel is its cost. In recent auctions, 20 MHz of licensed mid-band spectrum sold for over $17 billion for U.S. rights alone.

Fi-Wi applies the cellular C-RAN architecture indoors—but on unlicensed spectrum that costs nothing. This is the arbitrage opportunity.

2.6 The Client Side: L4S and the End of Uplink Contention

The architectural reset is not limited to the infrastructure; it fundamentally alters the behavior of the Station (STA). In legacy Wi-Fi, the STA is an autonomous agent that fights for upstream airtime using EDCA (Enhanced Distributed Channel Access). It maintains its own local WMM queues and blindly transmits whenever it wins a contention window, often oblivious to the fact that the AP's receive buffer is already full.

The L4S Inversion: With L4S, the "Quality of Service" decision moves from the Wi-Fi card's firmware to the application's congestion control algorithm. We replace the rigid, static categories of WMM with the dynamic, adaptive responsiveness of TCP Prague and other L4S-compliant congestion controls.

Eliminating the "Uplink Queue": This effectively virtualizes the queue. Instead of a deep buffer sitting on the Wi-Fi chip waiting to be transmitted, the packets are held in user-space memory on the client device, waiting for the "go" signal (or rather, the absence of a "stop" signal). The traffic never enters the contention domain until there is guaranteed capacity to service it. The STA no longer needs complex internal QoS schedulers because it is no longer trying to force more data than the pipe can hold.

Technical Insight: The "Driver Queue" Trap

In legacy systems, flow control happens at the driver level. When the Wi-Fi card's hardware buffer fills up (the TX Ring), it signals the Operating System to "Stop the Queue." The OS then buffers packets in software (qdisc) until the hardware signals "Go."

This is catastrophic for latency. It creates a hidden reservoir of old data sitting in the kernel, waiting for the hardware to clear. By the time the hardware is ready, the packets in the OS queue are already stale.

L4S eliminates this layer of buffering entirely. Because TCP Prague adjusts the send rate to match the actual airtime capacity (signaled via ECN), the application never sends enough data to fill the hardware ring buffer. The driver never has to assert flow control, the OS queue remains empty, and every packet that hits the driver is fresh, ensuring immediate transmission.

2.7 The Strategic Reset: Splitting the Graph

Solving this requires a "Subtractive Architecture." Instead of adding more features to the radio, we must remove them. The architectural breakthrough of Fi-Wi is decoupling the MCS State Graph described in Section 2.3.2 into its constituent parts:

This architectural shift—from distributed chaos to centralized control—mirrors the evolution from analog transmission systems (noise-prone, operator-invisible) to digital QAM (deterministic, monitorable). Fi-Wi completes this transformation for the last 10 meters, moving the network from a model of probabilistic negotiation to one of deterministic execution.

Section 13 describes the Concentrator's scheduling algorithm that implements this graph traversal, while Appendix C details the RRH's scatter-gather DMA mechanism that executes the chosen state transitions at microsecond timescales.

Technical Insight: The QoS Fallacy

Traditional QoS mechanisms in Wi-Fi—WMM access categories, priority queues, and traffic shaping—reflect a fundamental architectural flaw: treating contention as inevitable and attempting to optimize it through priority classes. This approach attempts to infer urgency by classifying packets, then granting probabilistic access to the medium—essentially rolling dice with weighted odds.

L4S changes the premise entirely. Flows signal their tolerance for delay using ECN, allowing the network to signal sources to control their own send rates. Across many flows, this controls the aggregate arrival rates at the forwarding plane based on real-time queue feedback rather than static classes.

In a Fi-Wi architecture, where all wireless transmissions are centrally scheduled with unified state, traffic no longer competes through contention. The Concentrator controls arrival rates to each Remote Radio Head, ensuring packets are transmitted at the precise moment they are needed. This deterministic scheduling replaces the probabilistic contention that WMM attempts to optimize. Consequently, the complex web of traditional QoS queues is rendered obsolete; we replace "Priority" (deciding who waits) with "Isolation" (ensuring no one waits).

2.8 Interactive Visualization: The MCS Collapse Under Load

The following interactive simulation demonstrates the architectural differences between Fi-Wi, autonomous APs, and mesh networks under varying load conditions. It visualizes the MCS State Graph discussed in Section 2.7, showing how autonomous systems fail to navigate this state space under density.

Each "room" represents a device with a 4 × 12 grid of MCS states (4 spatial streams × 12 MCS indices). The ghost node (dashed) shows the ideal state based on channel quality, while the active node shows the actual state selected by the rate control algorithm.

Click anywhere to open interactive version ↗

How to Use the Simulation

Quick Start - Try These Scenarios:

Interactive Controls:

What to Watch For:

Technical Details: Understanding the Visualization

MCS Grid: Each 4×12 grid shows all possible MCS states. Top rows = Mu-MIMO (multi-user), bottom rows = standard 2×2 MIMO. Columns = MCS index (0-11, higher = faster but needs better SNR).

Eigenvalues (λ₁, λ₂): Strength of spatial modes in the MIMO channel. As density increases in autonomous mode, λ₂ collapses → spatial interference.

Condition Number (κ): Ratio λ₁/λ₂ in dB. Low (~6 dB) = good. High (>12 dB) = Mu-MIMO degraded to single-stream. This directly demonstrates the "Spatial Contention Cascade" from Section 2.3.3.

Collision Probability: Computed using Birthday Paradox formula: n(n-1)/2 collision pairs. When this exceeds 50%, the network enters "Drift" state with unbounded latency.

Why This Matters for Network Operators

This visualization proves the loss of control described in Section 2.4. In autonomous mode, operators cannot engineer performance because the system navigates a 1,000+ state MCS graph with no global coordination.

In Fi-Wi mode, the Concentrator's global state visibility allows it to:

The result: predictable, engineerable performance that scales with density instead of collapsing. The difference becomes visceral when you watch autonomous mode turn red under the same load that Fi-Wi handles in green.


3. System Picture

System Diagram: Fi-Wi Concentrator, Central Packet Memory, and Multiple RRHs

                        ┌────────────────────────────────────────────┐
                        │              Fi-Wi Concentrator            │
                        │────────────────────────────────────────────│
   L4S/ECN-aware        │                                            │
   traffic from LAN/    │   ┌────────────────────────────────────┐   │
   WAN (IP/802.3)  ─────┼─▶│    Central Packet Memory & Queues  │   │
                        │   │  • Per-flow / per-tenant queues    │   │
                        │   │  • Per-airtime-domain queues       │   │
                        │   │  • Enqueue timestamps (µs)         │   │
                        │   └───────────────┬────────────────────┘   │
                        │                   │                        │
                        │   ┌───────────────▼────────────────────┐   │
                        │   │   L4S/AQM & Scheduler              │   │
                        │   │  • Sojourn-time based ECN marking  │   │
                        │   │  • TXOP length control (≈250 µs)   │   │
                        │   │  • RF grouping & spatial streams   │   │
                        │   └───────────────┬────────────────────┘   │
                        │                   │ PCIe over fiber        │
                        └───────────────────┼────────────────────────┘
                                            │
        ┌───────────────────────────────────┼───────────────────────────────────┐
        │                                   │                                   │
        │                                   │                                   │
┌───────▼─────────┐                ┌────────▼─────────┐                ┌────────▼─────────┐
│   RRH #1        │                │   RRH #2         │                │   RRH #3         │
│ (Thin MAC/PHY)  │                │ (Thin MAC/PHY)   │                │ (Thin MAC/PHY)   │
│  • RF front end │                │  • RF front end  │                │  • RF front end  │
│  • DFE + FFT    │                │  • DFE + FFT     │                │  • DFE + FFT     │
│  • Minimal MAC  │                │  • Minimal MAC   │                │  • Minimal MAC   │
│  • DMA engine   │                │  • DMA engine    │                │  • DMA engine    │
│  • PTP sync     │                │  • PTP sync      │                │  • PTP sync      │
└───────┬─────────┘                └────────┬─────────┘                └────────┬─────────┘
        │                                   │                                   │
        │                                   │                                   │
        │                 PCIe-over-fiber links (no deep queues in RRHs)        │
        │                                   │                                   │
        │                                   │                                   │
┌───────▼─────────┐                ┌────────▼────────┐                 ┌────────▼─────────┐
│   RRH #4        │     ...        │   RRH #N        │                 │   Wi-Fi STAs     │
│ (Thin MAC/PHY)  │                │ (Thin MAC/PHY)  │     (Rooms, AP-like cells, clients)│
│  • RF front end │                │  • RF front end │                 │  • Phones        │
│  • DFE + FFT    │                │  • DFE + FFT    │                 │  • Laptops       │
│  • Minimal MAC  │                │  • Minimal MAC  │                 │  • IoT devices   │
│  • DMA engine   │                │  • DMA engine   │                 │                  │
│  • PTP sync     │                │  • PTP sync     │                 │                  │
└─────────────────┘                └─────────────────┘                 └──────────────────┘
  

Key properties: Central packet memory and queues live entirely in the concentrator, where L4S-aware AQM and scheduling operate on true bottleneck queues. RRHs are kept as simple hardware endpoints (RF + minimal MAC + DMA + PTP), with no deep local buffering or autonomous AP logic. This enables stable L4S behavior, explicit TXOP control, and software-defined evolution of queueing and RF policies.

3.1 Classical Stack vs. Fi-Wi (The C-RAN Shift)

To understand Fi-Wi, we must first unlearn the definition of an "Access Point."

Reality Check 1: The RRH is a Micro-Bridge, Not an Access Point
The industry treats the AP as a "Router on the Ceiling." Fi-Wi replaces this with a Tunneling Bridge. The Shift: The RRH does not "process" the network; it "extends" it. It is a transparent pipe that bridges the airgap to the fiber, leaving all decision-making to the central brain.
Reality Check 2: Coordination vs. Control
Traditional "Centralized Controllers" (like Cisco/Aruba) provide Coordination. They tell APs which channels to use or which clients to kick, but the AP still decides exactly when to transmit every packet. The "Control Loop" is still distributed.

Fi-Wi provides Control. The Concentrator does not "suggest" a schedule; it executes it. It tells the RRH: "Transmit these specific bytes at exactly microsecond T." There is no disagreement, no race condition, and no distributed chaos.

In a typical controller-managed enterprise Wi-Fi deployment, a centralized controller (e.g., Cisco WLC, Aruba Mobility Controller, Ubiquiti UniFi Controller) coordinates AP configuration: channel assignment, transmit power, client steering recommendations, and SSID management. However, each AP remains autonomous at the data plane:

These systems are loosely-coupled: the controller manages the control plane (configuration, policy) but the data plane — queuing, MAC scheduling, aggregation, and packet forwarding — remains distributed and autonomous across individual APs.

In Umber Fi-Wi (C-RAN for Wi-Fi), we split the AP and cellularize the RF domain, down to room-level. The concentrator sees all flows, all queues, and all RRHs. The RRHs handle 802.11 MAC/PHY but are tightly time-synchronized and behave as DMA-driven PHY/MAC endpoints rather than autonomous APs. A set of RRHs and their shared queues form a cellularized Wi-Fi domain within the building, often at “cell per room” granularity.

Fi-Wi centralizes both control plane AND data plane with shared state across all RRHs. The concentrator doesn't just configure RRHs; it directly manages their queues, schedules their TXOPs, and maintains unified timestamp-synchronized state across the entire cellularized RF domain.

3.2 Dual-Loop Control Model

Conceptually, Fi-Wi decouples the system into two nested feedback loops, separated by timescale:

Outer loop (End-to-End Latency): [ L4S Sender ] ──(ms)──> [ Group Queue ] ──> [ Feedback (ECN) ] Inner loop (MAC Efficiency): [ Aggregation Buffer ] ──(µs)──> [ Airtime / PHY ]

The Outer Loop manages congestion and end-to-end latency (Internet speed). The Inner Loop manages MAC efficiency and radio timing (Airtime).

The Problem with Legacy Wi-Fi: Traditional APs couple these loops unpredictably, creating "sawtooth" latency patterns that confuse TCP.

The Fi-Wi Solution: By centralizing both loops in the Concentrator, Fi-Wi enforces a strict Time-Scale Separation. The Inner Loop runs so fast (3–5 kHz) that it appears as "constant service" to the slower Outer Loop (10–20 Hz), allowing L4S to stabilize perfectly.

(See Section 5: Control Architecture for the rigorous control-theoretic analysis and stability criteria.)


4. Key Fi-Wi Mechanisms

4.1 Time Synchronization

Fi-Wi operates across two distinct time domains simultaneously. The first is the concentrator's internal master clock, disciplined via PTP/802.1AS over the PCIe fronthaul (detailed in Section 4.7). The second is the 802.11 TSF (Target Sync Function) domain that 802.11 clients use to coordinate with the MAC layer. In a traditional AP these two clocks are decoupled — the AP runs one TSF and one clock. In Fi-Wi, with 24 RRHs each presenting a TSF-aware BSS, managing the relationship between them is a foundational architectural responsibility of the concentrator.

4.1.1 The Fronthaul Clock: PTP/802.1AS

The concentrator synchronizes its master clock to all attached RRHs on the order of microseconds (and substantially tighter when using PCIe-native timing mechanisms such as PTM — see Section 4.7 for the full hardware chain). This master clock gives every packet:

This clock lives entirely inside the Fi-Wi domain. Clients never see it directly. It is the coordinate system in which shim header timestamps (Section 4.2), AQM marking decisions (Section 4.3), and the ML training corpus (Section 15) are all expressed. Because all packet timestamps, service events, and queue measurements are expressed in this single master time domain, Fi-Wi can compute precise per-packet sojourn times independent of the TSF domain, enabling stable ECN marking and L4S control across the system.

4.1.2 The 802.11 TSF Domain

The 802.11 TSF is a 64-bit microsecond counter that every client associates with a BSS. Clients set their local TSF from beacons. They use it to wake from power save at the right moment, to interpret TBTT (Target Beacon Transmission Time), and to coordinate TXOP timing. The TSF is the only MAC-visible clock the 802.11 standard exposes at the MAC layer.

In a traditional single-AP deployment this is trivial: one AP, one TSF, one beacon stream. In Fi-Wi it is not. Consider a client in a room served by two RRHs in the same airtime domain. That client will receive beacons from both RRHs. If those beacons carry inconsistent TSF values, even small inconsistencies can lead to misaligned power-save wakeups, ambiguous TBTT interpretation, and in some implementations degraded performance or reassociation. The coherence of the TSF domain across all RRHs in a BSS is not optional; it is a hard correctness requirement.

Fi-Wi satisfies this requirement by construction: the concentrator generates all beacon frames. No RRH constructs its own beacon. The concentrator writes the TSF value into every beacon before dispatching it to the appropriate RRH for transmission. Because all TSF values originate from the same source and are derived from the same master clock, they are consistent by design rather than by coordination protocol. Within a given BSS, TSF values are identical across all participating RRHs; multiple TSF domains arise only when multiple BSS instances are present.

4.1.3 The Concentrator as Time Origin

The concentrator maintains 25 simultaneous time references: its own PTP-disciplined master clock and one 802.11 TSF per RRH. Each TSF has its own epoch (established at BSS creation) and its own drift correction term, derived from periodic synchronization updates over the fronthaul (PTP/802.1AS or PCIe PTM), which bound long-term drift. The concentrator knows the exact affine mapping between the master clock and every client-visible TSF domain at all times:

TSF_i(t) = (t_master - epoch_i) + drift_correction_i(t)

Any event — a packet enqueue, an ECN mark, a TXOP start, a beacon transmission — can be expressed in any of the 25 frames without loss of precision. This is the time-domain analog of a coordinate transformation: the concentrator is the origin from which all other reference frames are derived, and any event timestamp can be mapped between frames via a known, invertible affine transform, updated continuously via the fronthaul synchronization loop.

Figure 4.1-1: The Concentrator as Time Origin

Concentrator master clock (PTP-disciplined)
  │
  ├─ Master frame: all shim timestamps, sojourn times, AQM marks, ML labels
  │
  ├─ TSF_1:  epoch_1, drift_1(t)  →  beacon stream for RRH 1  ┐
  ├─ TSF_2:  epoch_2, drift_2(t)  →  beacon stream for RRH 2  │ identical within
  ├─ TSF_3:  epoch_3, drift_3(t)  →  beacon stream for RRH 3  │ a given BSS
  │   ...                                                       ┘
  └─ TSF_24: epoch_24, drift_24(t) → beacon stream for RRH 24

Any event E has coordinates in all 25 frames simultaneously.
Mapping between any two frames: affine transform, known at the concentrator,
updated continuously via the fronthaul sync loop.
    

The concentrator as the origin of 25 simultaneous time reference frames (for a 24-RRH deployment). Client-visible TSF domains are derived from the master clock via known affine transforms. Within a BSS, TSF values are identical across participating RRHs.

Why Distributed APs Cannot Do This

In a controller-managed AP deployment, each AP runs its own TSF independently. The controller can nudge APs toward a common time reference via 802.11v BSS Transition Management or out-of-band NTP, but it does not generate beacon frames — each AP does. This means TSF values across APs can diverge by the inter-AP sync error (typically tens to hundreds of microseconds with Ethernet-based PTP, more without it).

A client roaming between two such APs may see a TSF discontinuity at handoff. Power-save state, TBTT alignment, and any MAC-layer timing assumption the client holds must be renegotiated. In Fi-Wi, roaming between RRHs within the same concentrator domain is a TSF-transparent event: the client's TSF counter simply continues, because the new RRH's beacon carries the same TSF value the old one would have carried at that moment. The client does not know a handoff occurred at the MAC layer.

This unified time model also enables the concentrator to schedule transmissions across RRHs against a single global timeline, rather than relying on independent per-RRH contention processes. TSF continuity across RRH handoffs is a direct consequence of centralized beacon generation, and it is what makes Fi-Wi's active redundancy claims in Section 8 operationally credible: per-packet steering between RRHs is transparent to clients because the client's MAC-layer time reference never changes. This unified time model enables not only precise measurement, but coordinated control of transmission behavior across RRHs, as described in Section 4.1.4.

4.1.4 Time-Driven EDCA Orchestration

The unified time model described above is not only a measurement framework; it is the foundation for Fi-Wi's centralized MAC scheduling. In conventional 802.11 deployments, EDCA (Enhanced Distributed Channel Access) operates as a stochastic contention mechanism: each AP independently selects random backoff values within its CWmin/CWmax range, and medium access emerges probabilistically.

In Fi-Wi, EDCA is not treated as a distributed random process. It is treated as a centrally orchestrated actuation layer, driven by the concentrator's master time reference.

Because the concentrator maintains:

it can shape medium access behavior across RRHs by dynamically controlling EDCA parameters on a per-radio basis. The key parameters are:

By assigning narrowly bounded contention windows and staggered AIFS values across RRHs, the concentrator can bias contention outcomes such that one RRH is overwhelmingly likely to win access at a given moment. Rotating these parameters over time creates a soft time-division multiplexing (TDM) effect using standard EDCA semantics.

This transformation is only possible because all RRHs share a common time reference. The concentrator can schedule EDCA parameter updates relative to the master clock and ensure that all RRHs apply them in a coordinated manner. Without this shared time base, independent EDCA processes would quickly decorrelate and revert to stochastic contention.

Conceptually, the concentrator executes a scheduling loop:

for each scheduling interval:
  observe queue state across RRHs        // centralized visibility
  select next RRH (or RF group) to serve // queue-aware decision
  assign EDCA parameters (CWmin, CWmax, AIFS, TXOP)
  enforce timing relative to master clock // coordinated application

The result is not strict TDMA — 802.11 contention semantics are preserved and the system remains compliant with standard client behavior — but the distribution of outcomes is shaped by the concentrator. Over short time horizons, access becomes highly predictable and service intervals can be bounded. This has two critical consequences:

Because TSF values are consistent across RRHs, these scheduling decisions are MAC-transparent to clients. From the client's perspective, the network behaves as a single, coherent AP with stable timing characteristics, even as transmissions are steered across multiple physical radios.

Why Distributed AP Systems Cannot Replicate This

Controller-based Wi-Fi systems can configure EDCA parameters on individual APs, but they cannot coordinate their application in time with sufficient precision. Each AP maintains its own clock, its own contention process, and its own transmit queues.

Without a shared time origin and centralized queue visibility, EDCA remains a probabilistic mechanism. Attempts to tune contention parameters across APs produce statistical bias at best, not deterministic scheduling. The lack of a unified time domain prevents coordinated rotation of access privileges across radios.

Fi-Wi's ability to treat EDCA as a controllable scheduling primitive is a direct consequence of the concentrator's role as both the time origin and the sole owner of transmit queues.

This time-driven EDCA orchestration is the mechanism by which Fi-Wi converts the inherently stochastic 802.11 MAC into a predictable, centrally scheduled system — completing the chain from time synchronization through queue observability to stable L4S control.

4.2 Fi-Wi Shim Header

Between 802.3/IP and the fronthaul link we add a small internal metadata header. Conceptual form:

struct FiWiMeta {
  uint64_t seq;          // fronthaul sequence number
  uint64_t t_ingress_us; // time packet enqueued into group queue (central DRAM)
  uint32_t txop_id;      // TXOP this MSDU is in
  uint8_t  mpdu_idx;     // index within aggregate
  uint8_t  mpdu_cnt;     // total MSDUs in this TXOP
  uint8_t  ecn_flags;    // CE applied? which queue? reason bits
  uint32_t qlen_pkts;    // queue depth snapshot at TXOP start
};

This header is visible only inside the Fi-Wi domain. It lets us:

4.3 AQM / L4S Marking Placement

We choose the group queues in the concentrator—each corresponding to a cellularized airtime domain shared by one RRH or by multiple interfering RRHs—as the only places where deep queues are allowed and where we apply ECN:

Other queues (within RRH hardware, on the fiber/fronthaul link) are kept shallow via pacing and controlled descriptor posting. The group queues become the single bottlenecks in each cellularized airtime domain, which is exactly what L4S wants: a small number of stable, well-behaved bottlenecks with known behavior. The control policy is explicitly tuned to keep both average and tail queueing delay low.

4.4 Centralized Packet Memory and DMA

DMA (Direct Memory Access): Why RRHs Can Be Simple

The Standard AP Architecture: Traditional Wi-Fi chips already use DMA to move packets from host memory to the radio without CPU involvement. But they require a local CPU to create descriptors, manage buffers, and run the network stack. Every AP is a complete computer running millions of lines of Linux.

The Fi-Wi Innovation: DMA Over Distance (not RDMA)

Fi-Wi extends the PCIe bus over fiber, allowing the RRH's DMA engine to read and write remote memory in the Concentrator. To the RRH silicon, memory 100 meters away appears "local"—accessible with the same PCIe transactions a traditional Wi-Fi chip uses to access DRAM 10 millimeters away on the motherboard.

Result: The local CPU, local DRAM, and entire Linux stack can be eliminated. The RRH becomes a pure "micro-bridge"—just DMA + MAC/PHY logic.

The Silicon Cost Difference:

Component Traditional AP Fi-Wi RRH
MAC/PHY Silicon
(802.11 Radio Logic)
~15-20M gates
MIMO, error correction, etc.
Complexity dictated by physics
~15-20M gates
Same physics, same complexity
No savings here
Host SoC / CPU
(The "Brains")
~50-100M gates
Multi-core ARM CPU
DDR4 controller
Peripherals, caches, etc.
~100K-500K gates
Simple DMA state machine
Descriptor buffer only
100-1000x simpler
DRAM 256MB - 1GB DDR4
(Required for OS + buffers)
16-64KB SRAM
(Descriptor storage only)
Operating System Linux (millions of LOC)
Requires security patches
None
Zero software attack surface
Total Silicon ~70-120M gates ~15-20M gates

Direct Implications:

The Economic Model:

Traditional Architecture: 50 APs = 50 CPUs, 50 DRAM modules, 50 power supplies, 50 Linux installations, 50 security update cycles.

Fi-Wi Architecture: 1 powerful Concentrator (workstation-class) + 50 simple RRHs (DMA + radio only).

Total system cost is lower because you're paying for intelligence once, not 50 times.

Why Incumbents Cannot Do This:

Traditional AP vendors have already optimized their SoC designs—the CPU, DRAM controller, and peripherals are as efficient as they can be. But their architecture requires these components at every radio because each AP operates autonomously. Even if they wanted to simplify, the distributed control model forces complexity at the edge.

Fi-Wi's centralized architecture enables the per-radio simplification. This is a structural cost advantage, not a manufacturing efficiency. Replicating it would require incumbents to abandon their entire product line and business model—a classic Innovator's Dilemma.

Bottom Line: C-RAN works because silicon economics favor centralized intelligence. The gate count difference isn't cosmetic—it's the foundation of Fi-Wi's cost, power, and reliability advantages.

In Fi-Wi, packet memory is centralized in the concentrator:

Central DRAM (Fi-Wi Concentrator) ──────────────────────────────── Group queue A → RRH1, RRH2 (shared RF cell) Group queue B → RRH3 (isolated cell) Group queue C → RRH4–RRH7 (shared RF cell) ... Queues live centrally; RRHs are DMA clients draining those queues into airtime.

This design:

4.5 RRH Edge Control via Beacon Power Shaping

Because the Fi-Wi concentrator maintains shared state for the entire RF domain, it can directly control the RF footprint of each RRH by adjusting per-RRH beacon transmit power. This alters:

Beacon power is one of the most effective tools for dynamic RF cell shaping because it affects STA association and roaming decisions without modifying data-plane PHY rates. By lowering beacon power at certain RRHs and raising it at others, the concentrator can:

Traditional controller+AP systems attempt similar behavior but lack true shared state because each AP maintains its own queueing and PHY decisions. In Fi-Wi, beacon shaping is coordinated with:

This makes beacon power a first-class control variable in defining and stabilizing the boundaries of each cellularized RF domain.

4.6 Fronthaul Requirements and Feasibility

The Fi-Wi architecture requires deterministic, low-latency fronthaul links between the concentrator and RRHs. Because RRHs function as DMA engines accessing centralized packet memory (Section 4.4), Umber's implementation uses PCIe (PCI Express) over fiber rather than Ethernet. This section quantifies bandwidth, latency, and jitter requirements, and demonstrates that PCIe over fiber not only meets these requirements but provides superior performance compared to network-based alternatives.

4.6.1 Why PCIe Over Fiber?

The choice of PCIe over fiber instead of Ethernet is driven by the Fi-Wi architectural model:

RRHs as DMA engines: Each RRH directly reads packet descriptors from concentrator DRAM, fetches packet data, and writes received packets back to memory. This is native PCIe behavior—exactly how a network card or storage controller operates.

Latency advantage: PCIe avoids the network stack entirely:

Determinism: PCIe provides guaranteed bandwidth allocation and predictable latency through:

Simplicity: The RRH sees the concentrator's memory space directly. No protocol translation, no socket APIs, no network configuration.

4.6.2 PCIe Bandwidth Requirements

Each RRH requires bandwidth for:

1. Downlink packet DMA (concentrator → RRH)

For an RRH serving one or more STAs with aggregate capacity Ceff:

BWDL = Ceff · (1 + OHdesc)                    (4.1)

where OHdesc accounts for DMA descriptors, metadata, and PCIe TLP (Transaction Layer Packet) overhead (typically 10-20%).

Example: For Ceff = 600 Mbps (typical 802.11ax 2×2 MIMO) with OHdesc = 0.15:

BWDL = 600 · 1.15 = 690 Mbps

2. Uplink packet DMA (RRH → concentrator)

Typically symmetric or slightly higher than downlink due to ACKs and control frames:

BWUL ≈ BWDL · 1.1 ≈ 760 Mbps                   (4.2)

3. CSI and status updates

Channel State Information and MAC statistics are written to concentrator memory via PCIe:

BWCSI = Nsta · Nsc · Ntx · Nrx · Bsample · fCSI    (4.3)

For Nsta=4, Nsc=234, Ntx=2, Nrx=2, Bsample=24 bits, fCSI=50 Hz:

BWCSI = 4.49 Mbps per RRH

4. Control and command traffic (concentrator → RRH)

Configuration updates, timing sync corrections, power/channel commands:

BWcontrol ≈ 1-5 Mbps per RRH                         (4.4)

Total bidirectional bandwidth per RRH:

BWtotal = BWDL + BWUL + BWCSI + BWcontrol           (4.5)
BWtotal ≈ 690 + 760 + 4.5 + 2 = 1456 Mbps ≈ 1.5 Gbps

4.6.3 PCIe Link Configuration

PCIe bandwidth is determined by generation and lane count:

PCIe Gen Per-Lane Rate x1 Link x4 Link x8 Link
Gen 3 ~8 GT/s ~985 MB/s
(7.88 Gbps)
~3.94 GB/s
(31.5 Gbps)
~7.88 GB/s
(63 Gbps)
Gen 4 ~16 GT/s ~1.97 GB/s
(15.75 Gbps)
~7.88 GB/s
(63 Gbps)
~15.75 GB/s
(126 Gbps)
Gen 5 ~32 GT/s ~3.94 GB/s
(31.5 Gbps)
~15.75 GB/s
(126 Gbps)
~31.5 GB/s
(252 Gbps)

Note: Effective bandwidth accounts for 128b/130b encoding (Gen 3+) and protocol overhead.

RRH link sizing: For 1.5 Gbps per RRH requirement:

A single PCIe Gen 3 x1 lane is sufficient per RRH with substantial headroom.

4.6.4 Concentrator PCIe Topology

The concentrator must aggregate multiple RRH connections. Consider a 50-RRH deployment:

Total aggregate bandwidth requirement:

BWaggregate = NRRH · BWtotal                      (4.6)
BWaggregate = 50 · 1.5 Gbps = 75 Gbps (peak)

With 40% average utilization (typical for building-wide traffic):

BWtypical = 75 · 0.40 = 30 Gbps

Architecture Options:

Option 1: PCIe switch fabric

Option 2: Multi-host server (Dual Socket)

Option 3: The Fi-Wi Choice — Workstation-Class Single-Socket
To achieve perfect determinism, Fi-Wi standardizes on High-End Desktop (HEDT) / Workstation silicon (e.g., AMD Threadripper Pro or Intel Xeon W-3400 series). This "Goldilocks" topology enables the Non-Blocking Architecture detailed in Section 13.

4.6.5 PCIe Over Fiber: Physical Layer

Standard PCIe uses copper traces on motherboards (limited to ~30cm at Gen 3/4 speeds). To reach RRHs distributed throughout a building, PCIe signals are carried over fiber using optical transceivers.

Technologies:

1. Active Optical Cables (AOC)

2. Optical PCIe adapter cards

3. PCIe fabric extenders

Recommended approach for Fi-Wi: Optical PCIe adapter cards with standard fiber infrastructure, providing flexibility and leveraging commodity fiber installation.

4.6.6 Latency Analysis

PCIe over fiber latency components:

Component Latency
PCIe TLP formation (concentrator) 0.2-0.5 µs
Optical transceiver (TX) 0.1-0.3 µs
Fiber propagation (100m) 0.5 µs
Optical transceiver (RX) 0.1-0.3 µs
PCIe TLP processing (RRH) 0.2-0.5 µs
PCIe switch (if used) 0.1-0.3 µs per hop
Total one-way 1.2-2.4 µs
Round-trip (DMA read) 2.4-4.8 µs

Comparison to Ethernet:

Fronthaul Type Round-Trip Latency Determinism
PCIe over fiber 2.4-4.8 µs Excellent (credit-based)
10GbE (cut-through) 10-30 µs Good (with QoS)
10GbE (store-forward) 20-100 µs Fair (subject to congestion)

PCIe over fiber provides 5-10× lower latency than even optimized Ethernet, which is critical for the inner control loop (Section B) operating at 200-500 µs timescales.

4.6.7 Jitter and Determinism

PCIe's credit-based flow control eliminates congestion drops and provides deterministic latency:

Measured jitter: PCIe over fiber typically exhibits <50 ns jitter, well under the 200 ns budget for 1 µs time synchronization (Section 4.1).

This determinism is impossible to achieve with Ethernet without time-sensitive networking (TSN) extensions, which add complexity and cost.

4.6.8 Distance Limitations

PCIe over fiber distance depends on optical budget and signal integrity:

PCIe Gen Multi-Mode Fiber Single-Mode Fiber
Gen 3 (8 GT/s) 300 m 10 km
Gen 4 (16 GT/s) 100 m 2-10 km
Gen 5 (32 GT/s) 50-100 m 2 km

Fi-Wi requirement: Building-scale deployments require ≤100 m reach, easily achieved with Gen 3/4 over multi-mode fiber or any generation over single-mode fiber.

4.6.9 Cost Analysis

PCIe over fiber cost per RRH:

Component Cost (approx.)
RRH-side PCIe optical adapter $150-300
Fiber pair (50m installed) $50-100
Optical transceiver pair $50-100
PCIe switch port allocation $100-200
Total per RRH $350-700

Comparison to network alternatives:

Approach Cost per RRH Latency Determinism
PCIe over fiber $350-700 2-5 µs Excellent
10GbE + TSN $300-600 10-30 µs Good
Standard 10GbE $200-400 20-100 µs Fair

PCIe over fiber costs moderately more than standard Ethernet but delivers 5-10× better latency and superior determinism. For Fi-Wi's DMA-based architecture, this cost is justified by the performance and architectural simplicity gains.

For context: a typical enterprise AP costs $500-2000, and a cellular small cell costs $1000-5000. The fronthaul cost is comparable to or less than the radio cost difference, making it economically viable.

4.6.10 Alternative: Hybrid PCIe + Ethernet

For deployments where PCIe over fiber infrastructure is unavailable, a hybrid approach is possible:

This reduces PCIe bandwidth requirements (only packet data, not CSI/control) and allows leveraging existing Ethernet infrastructure for non-latency-critical traffic.

However, the pure PCIe approach is architecturally cleaner and avoids the complexity of dual-protocol RRH implementation.

4.6.11 Comparison to Cellular Fronthaul Standards

For context, cellular systems use:

CPRI (Common Public Radio Interface):

eCPRI (Enhanced CPRI) / Fronthaul Gateway:

Fi-Wi (PCIe over fiber):

Fi-Wi's functional split and PCIe transport provides a unique balance: lower bandwidth than CPRI, lower latency than eCPRI, and native integration with the DMA-based architecture.

4.6.12 Summary: PCIe Over Fiber Enables Fi-Wi Architecture

Requirement Target Achieved with PCIe Gen 3 x1
Bandwidth per RRH ~1.5 Gbps ✓ 7.88 Gbps (5× margin)
Aggregate (50 RRH) ~30 Gbps avg ✓ PCIe switch or multi-CPU
Round-trip latency <10 µs ✓ 2.4-4.8 µs
Jitter <200 ns ✓ <50 ns (credit-based)
Distance ≤100 m ✓ 300m MM / 10km SM
Determinism No drops, predictable ✓ Credit-based flow control
Cost per RRH <$700 ✓ $350-700

Why PCIe over fiber is the right choice for Fi-Wi:

  1. Native DMA model: RRHs are DMA engines—PCIe is the natural transport
  2. Lowest latency: 2-5 µs vs. 10-100 µs for Ethernet
  3. Perfect determinism: Credit-based flow control eliminates jitter and drops
  4. Architectural simplicity: No network stack, no protocol translation
  5. Proven technology: Used in HPC, storage (NVMe-oF), and telecom

The deterministic, sub-5-microsecond fronthaul is what enables Fi-Wi's centralized control, time synchronization, and single-bottleneck queueing architecture. Unlike Wi-Fi mesh, controller-based systems with over-the-air backhaul, or even Ethernet-based approaches, PCIe over fiber provides the predictable substrate needed for the control loops described in Appendices A and B to operate with the precision required for sub-millisecond tail latency control.

4.7 Precision Clock Synchronization over Fronthaul

The "cellularization" of Wi-Fi relies on a unified timebase. In the Fi-Wi architecture, time is not merely used for logging; it is a control variable. To achieve coordinated scheduling, accurate queue measurements, and seamless mobility, every RRH must share a precise understanding of "now" down to the microsecond level.

To achieve this, Fi-Wi establishes a strict Hierarchical Clock Tree over the PCIe fronthaul, leveraging the native determinism of the bus rather than the best-effort nature of packet switching.

4.7.1 The Concentrator as Grandmaster (GM)

The Fi-Wi Concentrator acts as the PTP Grandmaster (IEEE 1588v2 / 802.1AS) for the entire building. It houses the primary reference oscillator (typically a high-stability OCXO).

Diagram 4-2: The Fi-Wi Clock Tree Topology

          External Reference (Optional GPS/GNSS)
                       │
                       ▼
    ┌──────────────────────────────────────────────┐
    │            Fi-Wi Concentrator                │
    │     [ High-Stability Ocillator (OCXO) ]     │ ◄── Grandmaster (GM)
    │           (System Timebase t0)               │
    └──────────────────┬───────────────────────────┘
                       │ PCIe PTM / Hardware Sync
                       │ (Compensates for fiber flight time)
          ┌────────────┼─────────────┐
          ▼            ▼             ▼
    ┌───────────┐ ┌───────────┐ ┌───────────┐
    │   RRH 1   │ │   RRH 2   │ │   RRH 3   │      ◄── Slaves
    │ [LocalOsc]│ │ [LocalOsc]│ │ [LocalOsc]│
    │  Locked   │ │  Locked   │ │  Locked   │
    └─────┬─────┘ └─────┬─────┘ └─────┬─────┘
          │             │             │
          ▼             ▼             ▼
     Frequency-Coordinated Operation
    

4.7.2 What Clock Synchronization Actually Enables

A defining advantage of the Fi-Wi architecture is the use of "Hard Synchronization" via PCIe, rather than "Soft Synchronization" via Ethernet. While Ethernet-based APs rely on IEEE 1588 PTP, they are subject to switch jitter and software stack latency. PCIe over fiber eliminates these variables.

Feature Fi-Wi (PCIe over Fiber) Traditional APs (Ethernet)
Protocol PCIe PTM (Precision Time Measurement)
Hardware-native, bus-level messages
IEEE 1588 PTP
Packet-based, software/firmware stack
Sync Accuracy 20-50 nanoseconds
Bus cycle precision + fiber margin
100ns – 10µs
Highly dependent on network load
Jitter Source Minimal
Point-to-point hardware flow control
High
Switch queuing & software interrupt latency
CPU Overhead Zero
Handled entirely by PCIe PHY/Controller
Moderate to High
CPU must interrupt to process sync packets
Primary Benefits Accurate L4S timestamps, TSF synchronization, unified timeline for clients Basic time sync for logging and management

Important Note: While frequency-locked clocks provide excellent timing consistency, they do not enable RF phase control or coordinated simultaneous transmission. COTS Wi-Fi chips have independent RF synthesizers with arbitrary phase offsets that cannot be controlled externally. The value of clock synchronization lies in accurate timestamping for L4S queue management and consistent TSF counters for seamless client mobility, not in RF phase alignment.

4.7.3 Operating Modes: GPS-Disciplined vs. Free-Wheeling

The Concentrator's clock behavior depends on the deployment environment and regulatory requirements. There are two distinct modes of operation:

Mode A: GPS-Disciplined (Absolute Synchronization)

In this mode, the Concentrator is connected to an external GNSS (GPS/Galileo) receiver. The internal oscillator is disciplined to align with UTC (Coordinated Universal Time). This connects the internal timing of the Fi-Wi system to external absolute time.

Mode B: Free-Wheeling (Relative Synchronization)

In deep indoor environments (basements, bunkers) where GPS is unavailable, or cost-sensitive deployments where 6 GHz AFC is not required, the Concentrator operates in Free-Wheeling mode.

The Engineering Reality: Timing Consistency vs. Absolute Time
For dynamic RRH selection and coordinated scheduling, what matters is consistent timing across RRHs, not absolute UTC accuracy. As long as all RRHs maintain synchronized TSF counters relative to the Concentrator, the system can provide seamless mobility and accurate queue measurements—even if the system's concept of "UTC" is drifting by seconds per year relative to atomic time.

Because all RRHs are frequency-locked to the same Concentrator oscillator, if the Concentrator drifts, the entire system drifts in unison. This uniform time base enables coordinated operation without requiring external time references for basic functionality.

4.7.4 When Absolute Time Becomes Mandatory

While Free-Wheeling mode is sufficient for core system operation, GPS-Disciplined (Absolute) mode becomes mandatory when the Fi-Wi system interacts with external systems that require UTC timestamps:

  1. 6 GHz AFC (Automated Frequency Coordination): To operate at Standard Power in the 6 GHz band (essential for outdoor or large-venue coverage), the FCC requires the system to check a central database for incumbent microwave links. The database operates on UTC. The Concentrator must sign its request with a precise, absolute timestamp and geolocation. A drifting clock will cause the AFC request to be rejected, forcing the system into Low Power Indoor (LPI) mode.
  2. Inter-Concentrator Handoffs (Multi-Building Roaming): In a campus environment with two distinct Concentrators (e.g., Building A and Building B), a client roaming between them may experience time jumps. If Concentrator A and B are free-wheeling independently, their timestamps may differ by seconds. This jump can break high-level security protocols (like Kerberos or 802.1X re-authentication) that reject "replay attacks" based on timestamp windows.
  3. Correlated Debugging: If a user reports a connectivity drop at 10:04 AM, but the Concentrator has drifted by 45 seconds, the system logs will be stamped 10:04:45. Correlating Fi-Wi logs with client-side logs (which are usually synced to NTP/Cellular time) becomes operationally difficult, complicating root-cause analysis.

4.7.5 RRH Clock Distribution Hardware

Standard enterprise APs utilize free-running crystal oscillators with ~20 ppm frequency error. This causes TSF counters to drift relative to each other, making seamless mobility difficult. To achieve the timing consistency required for Fi-Wi's coordinated operation, the RRH hardware architecture must be fundamentally different.

The Fi-Wi Solution: The RRH hardware uses Mobile-Class Wi-Fi Silicon (which natively supports external clock inputs) driven by a Fronthaul-Recovered Precision Clock.

Diagram 4-3: RRH Precision Clock Distribution Chain

┌──────────────────────────────────────────────────────────────────────────────┐
│                        RRH CLOCK DISTRIBUTION ARCHITECTURE                   │
└──────────────────────────────────────────────────────────────────────────────┘

        [ PCIe Over Fiber ]
                 │
                 │ (1) PTM Timestamps (Implicit Clock)
                 ▼
   ┌─────────────────────────────┐
   │      RRH FPGA / Retimer     │
   │   (Clock Recovery Circuit)  │
   └─────────────┬───────────────┘
                 │
                 │ (2) "Dirty" Recovered Clock (High Jitter)
                 ▼
   ┌─────────────────────────────┐           ┌─────────────────────────────┐
   │    JITTER ATTENUATOR IC     │           │    WI-FI 7 SOC (Client)     │
   │    (e.g., Si5395 / LMK05)   │           │                             │
   │                             │           │                             │
   │   ┌─────────────────────┐   │           │    ┌───────────────────┐    │
   │   │  Digital Servo Loop │   │ (3) Clean │    │   Internal PLL    │    │
   │   │      (DSPLL)        │───┼───────────┼───►│ (RF Synthesizer)  │    │
   │   └─────────────────────┘   │ 40 MHz    │    └─────────┬─────────┘    │
   │                             │ Reference │              │              │
   └─────────────────────────────┘           └──────────────┼──────────────┘
                                                            │
                                                            ▼
                                                   [ 5 GHz / 6 GHz ]
                                                   [ RF Carrier    ]
                                                   (Independent phase per RRH)
    

Signal Flow: The RRH recovers a noisy clock from the PCIe fronthaul. A digital Jitter Attenuator cleans the signal using an internal DSP servo loop. This provides the ultra-low phase noise reference required for 4096-QAM while maintaining frequency lock to the Concentrator's timebase. Note: The Wi-Fi chip's internal PLL establishes its own RF carrier phase, which is independent across RRHs.

The clock distribution chain operates as follows:

  1. Concentrator (Grandmaster): Distributes the master timebase via PTM packets over the PCIe-over-fiber link.
  2. RRH FPGA / Retimer: Recovers the implicit clock from the PCIe bitstream or explicit PTM timestamps.
  3. Network Synchronizer (Jitter Attenuator):
    • Component: e.g., Silicon Labs Si5395 or TI LMK05318.
    • Function: Feeds the "dirty" recovered clock digitally into this dedicated IC.
    • Cleaning: The IC uses an internal, narrow-bandwidth DSP servo loop to filter out PCIe transport jitter, synthesizing a pristine 40 MHz reference.
  4. Wi-Fi SoC (Client SKU): The cleaned signal is fed directly into the chip's Ext_Ref / XO_IN pin. The chip's internal PLLs lock to this external frequency reference, ensuring consistent TSF counter operation across all RRHs.
Architectural Decision: Digital Holdover vs. Voltage Control
Fi-Wi uses a Digital Network Synchronizer rather than a traditional VCTCXO servo loop. In a VCTCXO design, any noise on the analog control voltage line translates directly into phase noise, which degrades 4096-QAM EVM. By using digital jitter attenuation, the control loop remains in the digital domain until final synthesis, ensuring ultra-low phase noise while providing superior holdover stability if the fiber link flickers.

4.7.6 Why Mobile Wi-Fi SKUs?

Fi-Wi explicitly selects Mobile/Client Wi-Fi 7 chipsets (e.g., Qualcomm FastConnect or Broadcom BCM43xx client series) rather than traditional Enterprise AP SKUs. This choice is driven by specific architectural needs:

4.7.7 What Clock Synchronization Does NOT Enable

It is important to understand the limitations of frequency-locked clocks with COTS Wi-Fi hardware:

Key Insight: The frequency-locked clock discipline ensures that TSF counters increment synchronously across all RRHs. This enables consistent timing for seamless mobility and accurate queue measurements—but does not enable RF phase control or coordinated simultaneous transmission. Those capabilities would require custom ASIC development with externally-controllable RF synthesizers, which is beyond the scope of COTS Wi-Fi chipsets.

5. Control Architecture: The Dual-Integrator System

A rigorous control-theoretic analysis of Wi-Fi reveals a fundamental challenge: there are not one, but two distinct integrators in the transmit path. In traditional autonomous APs, these integrators are coupled in undefined ways, leading to instability (bufferbloat) and poor interaction with TCP congestion control. Fi-Wi explicitly separates these integrators, applies distinct control laws to each, and enforces a strict Time-Scale Separation to guarantee system stability.

5.1 The Two Integrators

To achieve stability, we must model and control two distinct accumulation processes:

  1. The Outer Integrator (Group Queue): Located in the Concentrator. This accumulates packets based on the mismatch between arriving traffic (internet speed) and the wireless link capacity. It operates on the RTT timescale (milliseconds).
  2. The Inner Integrator (Aggregation Buffer): Located logically between the Concentrator and RRH. This accumulates packets to build 802.11 A-MPDU aggregates for PHY efficiency. It operates on the TXOP timescale (hundreds of microseconds).

5.2 The Outer Loop: L4S and Group Queue Dynamics

The primary bottleneck managed by the AQM (Active Queue Management) is the Group Queue. This loop drives the end-to-end congestion control (L4S/TCP).

5.2.1 Queue Dynamics

The queue depth Q(t) evolves based on the mismatch between the arrival rate λ(t) and the effective service rate μ(t):

dQ/dt = λ(t - τ_fwd) - μ(t)

5.2.2 The PI² Control Law

Fi-Wi uses a PI² controller to calculate a marking probability \( p(t) \), targeting a shallow queue reference \( Q_{ref} \) (typically 200 µs). This provides a coherent signal to L4S senders:

p(t) = K_alpha * (Q(t) - Q_ref) + K_beta * ∫ (Q(t) - Q_ref) dt
Concept Shift: AQM vs. Active Rate Management (ARM)

Traditional congestion control relies on Active Queue Management (AQM): a queue must physically build up before the network detects congestion and signals the sender to slow down. The goal is to manage the queue size.

L4S enables a new paradigm called Active Rate Management (ARM).

Reference: Koen De Schepper, "Understanding Latency 4.0", December 2025.
Watch the explanation (19:15)

5.3 The Inner Loop: MAC Aggregation and TXOPs

The Inner Loop manages the trade-off between PHY efficiency (large aggregates) and latency (small aggregates). In traditional APs, this integrator is effectively unbounded to maximize benchmark scores, creating a "sawtooth" latency pattern that confuses TCP.

Fi-Wi bounds this integrator via two mechanisms:

5.4 System Integration: Time-Scale Separation

For the nested loops to remain stable, the Inner Loop must look like "constant service" to the Outer Loop. This requires the Inner Loop bandwidth (ωmac) to be significantly higher than the Outer Loop bandwidth (ωtcp):

ω_mac >> ω_tcp   (typically > 20:1 ratio)

5.4.1 Frequency Domain Constraint

By forcing the MAC to operate at a frequency of 3–5 kHz (via 250 µs TXOPs), the aggregation noise is pushed high enough that it is naturally filtered out by the TCP loop (which operates at 10–20 Hz).

5.4.2 A-MPDU Aggregation Coherence and ECN Marking Precision

The 250 µs TXOP constraint serves a dual purpose: it maintains time-scale separation and ensures L4S receives coherent ECN feedback. Traditional Wi-Fi's massive A-MPDU aggregation creates a fundamental mismatch between Layer 2 efficiency and Layer 3 control precision.

The Aggregation-Feedback Mismatch

In wide-channel deployments (160 MHz), APs build large A-MPDU aggregates containing dozens of IP packets to amortize MAC overhead. This creates three control-loop pathologies:

Fi-Wi's Coherence Strategy

Fi-Wi resolves this through coordinated design:

  1. 40 MHz Channel Width: Narrower channels require smaller aggregates, naturally increasing MAC service frequency. More frequent transmissions with smaller payloads ensure sojourn time measurement occurs at packet granularity.
  2. Concentrator-Level ECN Marking: The Concentrator performs sojourn time measurement and ECN marking before handing packets to RRHs for PHY transmission, preserving microsecond-level queueing visibility.
  3. Bounded TXOP Duration: The 250 µs maximum ensures MAC service frequency remains >10× higher than L4S control frequency (~1 RTT), enabling senders to interpret ECN marks as smooth probability signals rather than discrete bursts.

This approach maintains the benefits of A-MPDU efficiency while preserving the feedback coherence L4S requires. The result: DualQ can sustain its ~1ms target drain time without artificial inflation from aggregate assembly delays. For detailed analysis, see Appendix I.7.

5.4.3 Design Parameters for Stability

Fi-Wi uses these parameters to ensure the system remains critically damped:

Loop Parameter Target Value Rationale
Outer Queue Reference 200 µs Maintains ultra-low queuing delay.
Outer Update Interval 5 ms (~1 RTT) Matches typical control loop frequency.
Inner Target TXOP 250 µs Ensures ωmac >> ωtcp.
Inner Max Aggregate 32 MSDUs Limits tail latency contribution.

6. Airtime Domains and Dynamic Queue Grouping

In Fi-Wi, the core rule is: there is one deep queue per independent airtime resource. The physical queue lives in concentrator memory, but it represents the airtime of one RRH or a dynamic group of RRHs whose RF signals are coupled strongly enough to behave like a single cell.

If two RRHs can interfere, they cannot transmit simultaneously and therefore must share a single logical queue. If RRHs are RF-isolated, each receives its own queue. This preserves the “one bottleneck per control loop” structure required by L4S.

6.1 Why airtime determines queue structure

Service at each queue corresponds to over-the-air transmission. Any RRHs that share RF space must share a service process and therefore share a queue. RRHs that do not interfere have independent airtime and get independent queues.

Concentrator Queues (central DRAM, cellularized domains) ──────────────────────────────────────────────────────── Queue A (airtime domain A) ├── RRH1 └── RRH2 Queue B (airtime domain B) └── RRH3 Queue C (airtime domain C) ├── RRH4 ├── RRH5 ├── RRH6 └── RRH7 Queue D (airtime domain D) └── RRH8

6.2 Forming airtime groups dynamically

Crucially, these RF groups and their queues are not static. The concentrator forms and maintains airtime domains dynamically using:

Beyond simple interference, Fi-Wi’s groupings also consider the spatial structure of the channels:

Over time, the Fi-Wi system continuously adjusts:

Groups may merge if interference appears or split if RRHs become effectively isolated (e.g., after a channel change or power adjustment, including beacon power shaping). The AQM and ECN marking logic always runs at the current group queue, so L4S always sees a single, well-defined bottleneck per cellularized domain.

Because all RRHs expose real-time CSI, queue metrics, retry statistics, airtime usage, and beacon reports into the concentrator’s shared state, Fi-Wi can form RF groups that are tuned not just for coverage but for:

6.3 Room-Level RRH Density (FTTR-Class Deployment)

Fi-Wi is not designed around a small number of big AP cells per floor. The architecture assumes something much closer to Fiber-to-the-Room (FTTR): one cell per room, with fiber or equivalent deterministic fronthaul feeding small RRHs in each room.

In higher-end deployments, each room can contain multiple RRHs (e.g., 2–4 per room) to support:

Room-level Fi-Wi layout (conceptual) [Fi-Wi Concentrator] │ Fiber / fronthaul │ ┌──────────┼──────────┬──────────┐ │ │ │ │ Room 1 Room 2 Room 3 Room 4 │ │ │ │ RRH1..4 RRH5..8 RRH9..12 RRH13..16 (2–4/rm) (2–4/rm) (2–4/rm) (2–4/rm)

This density dramatically improves RF control. With RRHs separated by just a few meters, the concentrator sees:

Within a single room (example: 4 RRHs) Ceiling plan (top view) ─────────────────────── RRH-A RRH-B ●-----------● | | | | ●-----------● RRH-C RRH-D All four RRHs feed central queues with shared state and CSI.

Traditional AP-based architectures cannot achieve this cleanly because they lack shared state and maintain separate, isolated queues and PHY/MAC processes in each AP. Even with a central controller, they are limited to heuristic steering and static power/channel tweaks.

Fi-Wi, by contrast:

A cell-per-room architecture makes Fi-Wi fundamentally different from controller-based Wi-Fi: it behaves more like cellular small cells with centralized coordination than like a set of autonomous APs.


7. Queue Architecture for Fi-Wi

Fi-Wi centralizes packet memory, queueing, AQM, and TXOP scheduling inside the concentrator. Because the concentrator is the true bottleneck for all wireless transmissions, Fi-Wi can use a clean, minimal queue structure that behaves predictably under load and exposes stable delay semantics to L4S congestion controllers. This stands in contrast to traditional APs, where dozens of hidden queues (per-station, per-TID, firmware rings, retry/BA windows, PS-poll buffers, rate-control queues) produce variable and unobservable queueing delay.

This section describes Fi-Wi’s queue architecture, why WMM priority becomes largely unnecessary, and how centralized TXOP scheduling eliminates the stochastic contention that drives Wi-Fi collapse in legacy systems. The goal is simple: a minimal number of queues, explicit queue semantics, and predictable latency for all traffic classes.

7.1 Why queue architecture matters

Because all packets live in the concentrator’s memory until the moment they are transmitted over the air, Fi-Wi can explicitly control:

This allows Fi-Wi to do what distributed APs cannot: construct a consistent, visible bottleneck queue that L4S congestion controllers can lock onto with stable behavior.

7.2 The theoretical case: L4S makes most priority obsolete

If queue delay is capped around 500 µs, legacy WMM categories provide little additional value. For example, consider a voice stream:

Voice codec: 80 bytes every 20 ms  (64 kbps)
Transmit time at 1 Gbps: ~0.64 µs
L4S queue target:        500 µs
Voice latency budget:    ~150,000 µs

Queue share: 500 / 150,000 = 0.3%

If L4S keeps queueing delay under ~500 µs, then all traffic — including voice — stays far inside its latency budget. WMM’s role in combatting bufferbloat disappears when bufferbloat itself is removed.

7.3 Practical complications

Three real-world issues motivate a cautious design:

• UDP does not respond to ECN

Voice and video often use UDP. They:

Fi-Wi can mitigate this using per-flow fair queuing inside the L4S queue, keeping UDP in check without needing a separate WMM hierarchy.

• Airtime vs. queue time

Total latency = Queue delay + Contention delay + TX delay + Retry delay
                    ^^^^^^^^^^^^
             L4S controls this

WMM historically manipulates AIFS, CW, and TXOP to reduce contention delay. Fi-Wi eliminates contention entirely using centralized TXOP scheduling, so WMM’s airtime hacks lose relevance.

• Failure modes and defense-in-depth

Even L4S can fail under:

Hence, Fi-Wi benefits from a small amount of priority separation, at least in early deployments.

7.4 Minimal 3-queue structure

The theoretically sufficient minimal queue architecture for Fi-Wi is three queues:

Figure 7-1: Minimal 3-Queue Fi-Wi Architecture

                    ┌──────────────────────────────────────────┐
                    │               Concentrator               │
                    │ (Central Packet Memory • AQM • TXOP)     │
                    └──────────────────────────────────────────┘
                                   ▲
                                   │
                     ┌─────────────┼──────────────────┐
                     │             │                  │
                     │             │                  │
            ┌────────┴───┐   ┌─────┴─────┐    ┌───────┴──────┐
            │ Q_mgmt     │   │ Q_L4S      │    │ Q_classic     │
            │ (Strict    │   │ (ECT(1),   │    │ (ECT(0),      │
            │  priority) │   │  dual-Q)   │    │  classic)     │
            └──────┬─────┘   └─────┬─────┘    └──────┬────────┘
                   │               │                  │
                   └───────────────┼──────────────────┘
                                   │
                          TXOP Scheduler
                  (Build AMPDU • Select RRH • 200–250µs)
                                   │
         ┌─────────────────────────┼──────────────────────────┐
         │                         │                          │
     ┌───▼───┐               ┌─────▼─────┐              ┌─────▼─────┐
     │  RRH1 │               │   RRH2    │              │   RRH3    │
     │ (PHY) │               │  (PHY)    │              │  (PHY)    │
     └───────┘               └───────────┘              └───────────┘

The minimal Fi-Wi queue architecture contains a strict-priority management queue plus dual-queue L4S (L4S + Classic). All buffering lives in the concentrator; RRHs keep no deep queues. L4S senders see a clean single-bottleneck model, and all 802.11 management frames bypass AQM entirely for correctness.

In this design, WMM is unnecessary at the wireless bottleneck. All data traffic benefits from the same controlled queue delay, and fairness is enforced by per-flow scheduling rather than EDCA.

7.5 Pragmatic 5-queue structure

A more conservative deployment uses five queues per airtime domain:

  1. Qmgmt — Management & control (strict priority)
  2. QL4S-hi — High-priority L4S (voice, control)
  3. Qclassic-hi — High-priority classic (legacy VoIP)
  4. QL4S-be — L4S best-effort (bulk QUIC/TCP)
  5. Qclassic-be — Classic best-effort (legacy devices)

Figure 7-2: Pragmatic 5-Queue Fi-Wi Architecture (Defense-in-Depth)

                       ┌───────────────────────────────────────────┐
                       │                Concentrator                │
                       │  (Central Packet Memory • AQM • TXOP)      │
                       └───────────────────────────────────────────┘
                                   ▲
                                   │
          ┌─────────────── Five Logical Queues Per Airtime Domain ────────────────┐
          │                                                                        │
    ┌─────┴─────┐   ┌─────────┬──────────┬──────────┬──────────┬─────────┬────────┘
    │ Q_mgmt     │   │ Q_L4S-hi│ Q_classic-hi│ Q_L4S-be │ Q_classic-be │
    │ (priority) │   │ (Voice) │ (Legacy VoIP) │ (Bulk TCP/QUIC) │ (Legacy bulk) │
    └─────┬──────┘   └──────┬──────────────┬──────────────┬──────┘
          │                 │              │              │
          └─────────────────┼──────────────┼──────────────┘
                            │
                      TXOP Scheduler
                (Build AMPDU • Select RRH • Delay Targets)
                            │
      ┌─────────────────────┼──────────────────────────┐
      │                     │                          │
   ┌──▼───┐            ┌────▼────┐                 ┌────▼────┐
   │ RRH1 │            │ RRH2    │                 │ RRH3    │
   │ (PHY)│            │ (PHY)   │                 │ (PHY)   │
   └──────┘            └─────────┘                 └─────────┘

The 5-queue design provides a two-tier priority system across L4S and Classic traffic. This conservative architecture offers compatibility with legacy UDP voice/video, while still keeping Fi-Wi’s centralized L4S semantics intact. Over time, deployments can collapse from 5 queues to 3 as performance data validates the simpler model.

7.6 Numerical examples

Consider 10 simultaneous HD video calls (~20 Mbps total) plus a saturating background TCP flow:

Legacy WMM:

Fi-Wi with L4S + fair queuing:

This is roughly 1000× lower queueing latency than legacy WMM systems, and it applies to all traffic, not only traffic in a “priority” AC.

7.7 Deployment strategy

Fi-Wi can phase its queue structure over time:

Metrics to monitor include:

7.8 WMM support in Fi-Wi

WMM exists to correct three historical problems in distributed Wi-Fi:

Fi-Wi removes the root causes of these behaviors:

Because of this, full WMM support at the air bottleneck is not necessary. However, Fi-Wi does support WMM semantics for:

Fi-Wi handles WMM as an admission-time mapping:

This preserves compatibility while avoiding the complexity and unpredictability of EDCA-based priority systems. Over time, Fi-Wi deployments can rely on pure L4S semantics and collapse WMM to a compatibility shim, not a required scheduling mechanism.

7.9 Summary

Fi-Wi’s centralized queue architecture enables:

Traditional Wi-Fi uses WMM to work around bufferbloat and contention. Fi-Wi removes those problems entirely through tight queue control, shared state, and central scheduling. Priority becomes a policy choice — not a crutch for an unstable MAC.

In Fi-Wi, the Carve-Out ensures the voice packet (L4S) bypasses the accumulated Classic bulk data completely. The file download continues to saturate the link, but the latency of the L4S flow is decoupled from the load of the Classic flow.

8. RRH-Level Active Redundancy

Fi-Wi’s centralized shared state across RRHs makes it natural to treat multiple radios as an active redundant set for the same STA or room. This is analogous in spirit to 802.11be’s Multi-Link Operation (MLO), where a single multi-link device (MLD) can use multiple links for reliability and capacity. In Fi-Wi, the concentrator is the coordination point leveraging shared state, and the RRHs are the distributed radios providing multiple RF paths.

8.1 Uplink: Duplicate Reception & Diversity

In many deployments, a client STA will be audible at more than one RRH (overlapping coverage). On the uplink, Fi-Wi exploits this spatial diversity to improve reliability without requiring changes to the client.

  1. Multi-Point Reception: Multiple RRHs may receive the same MPDU from a transmitting STA, potentially at different SNR/MCS levels.
  2. Forwarding: Each RRH decodes the frame locally. If the Frame Check Sequence (FCS) passes, the RRH timestamps the frame (using the shared global timebase), attaches metadata (RSSI, SNR, Channel State Information), and forwards it to the Concentrator via the PCIe-over-Fiber link.
  3. Post-Detection Selection: Effectively, the Concentrator acts as a Post-Detection Selection Diversity combiner:

This approach leverages the spatial diversity of distributed RRHs to mitigate shadowing and multipath fading. Because the selection logic operates on valid MAC frames (after FCS verification) rather than raw I/Q samples, this architecture maintains compatibility with standard COTS Wi-Fi silicon at the Radio Head.

Uplink redundancy STA │ (same frame) ╱ ╲ RRH1 RRH2 │ │ └──► Fi-Wi Concentrator ◄──┘ (dedup + select)

8.2 Downlink: per-packet steering

On the downlink, the concentrator can treat multiple RRHs as candidate transmitters for a given STA or room:

This gives Fi-Wi:

Group Queue (airtime domain A) ────────────────────────────── │ ├─► RRH1 TXOPs to STA └─► RRH2 TXOPs to STA (backup or parallel) Concentrator chooses RRH per TXOP based on CSI + load + shared state.

8.2.1 Listen-Before-Talk (LBT) and RRH Eligibility for Downlink Scheduling

In a multi-RRH Fi-Wi deployment, each radio head operates on the same BSSID and channel but sits in a different physical location with its own RF conditions. While Fi-Wi centralizes all queueing and scheduling decisions, every RRH must still obey the fundamental 802.11 rule: listen-before-talk (LBT).

This is where Fi-Wi diverges sharply from classical multi-AP systems. In UniFi, Ruckus, Aruba, and all controller-based Wi-Fi architectures, each AP queue is blind to the RF medium state until it attempts to transmit. The AP commits a packet to the hardware queue, and if the medium is busy, the packet waits (Head-of-Line blocking) while the AP performs backoff.

Fi-Wi inverts this. RRHs continuously report their LBT Eligibility Status (Clear/Busy) to the Concentrator via the high-speed telemetry path. RRHs report LBT eligibility status via PCIe telemetry with update intervals of 100–500 µs, well-matched to inter-TXOP scheduling decisions. While the Concentrator cannot react within a single 9µs backoff slot, it operates on the Inter-TXOP timescale (200–500 µs1).

Before posting a new DMA descriptor to an RRH, the Scheduler checks this eligibility:

This prevents Head-of-Line Blocking where a packet sits in a hardware queue on a jammed radio. When multiple RRHs report clear airtime, Fi-Wi selects among them based on link quality (CSI) and predicted airtime efficiency. Conversely, if all RRHs report medium-busy, no RRH is primed; the scheduler pauses the flow to prevent backpressure from accumulating in the RRH hardware, keeping the queue depth visible in the Concentrator where L4S can measure it.

The result is a form of Centralized Selection based on LBT Eligibility. Multi-AP systems coordinate configuration (channels, power), but they cannot coordinate transmit starts because they lack the real-time feedback loop to steer packets away from busy radios before they are queued.

1 Representative scheduling interval for mixed traffic workloads; actual TXOP durations range from tens of microseconds (small frames) to several milliseconds (large aggregates).

Figure 8-3: Per-RRH LBT eligibility feeding the centralized Fi-Wi scheduler.
                        (Shared RF / Airtime Domain)

       +----------------------+                 +----------------------+
       |      RRH-A           |                 |      RRH-B           |
       |  (Room / Zone A)     |                 |  (Room / Zone B)     |
       +----------------------+                 +----------------------+
       |  LBT: Clear          |                 |  LBT: Busy (ED high) |
       |  Eligible = YES      |                 |  Eligible = NO       |
       +----------+-----------+                 +-----------+----------+
                  |                                           |
                  |  Fiber fronthaul (low latency)            |
                  |                                           |
                  v                                           v

                     +-----------------------------------+
                     |  Fi-Wi Concentrator / Scheduler   |
                     +-----------------------------------+
                     |  Centralized queue for building   |
                     |  L4S feedback / congestion state  |
                     |                                   |
                     |  Decision: Post Descriptor to A   |
                     |  (RRH-B flagged as jammed/ineligible|
                     |   prevents HoL blocking)          |
                     +----------------+------------------+
                                      |
                                      | Downlink frames / aggregates
                                      v

                               +--------------+
                               |   Client(s)  |
                               +--------------+
  
Figure 8-4: Inter-TXOP Steering. The Scheduler uses LBT state to decide where to stage the next packet. Note: The RRH still performs local backoff; the Scheduler simply ensures data is staged at the RRH that currently reports clear channel conditions.
Time →
------------------------------------------------------------------------------------------------->

RRH-A (Room A):        [ Sense medium ]  [ Idle ]  [ Clear ]  [  Transmit TXOP  ]  [ Idle ... ]
                       |<-- DIFS --->|   |<---- contention window (few slots) ---->|

RRH-B (Room B):        [ Sense medium ]  [  ED high: medium busy  ]  [ Backoff ... ]
                       |<---- busy ---->|

RRH LBT → Scheduler:       A: "Clear"                  B: "Busy"

Scheduler View:        [ Receive LBT states from A, B ]
                       [ Mark A = eligible, B = ineligible ]
                       [ Dequeue next packets from central queue ]
                       [ Post descriptor to RRH-A only ]

Downlink Action:       RRH-A receives descriptor, enters backoff, wins, transmits.
                       RRH-B remains silent (no descriptor posted).

Effect:                • No packet trapped in RRH-B's buffer
                       • No exponential backoff storm
                       • Deterministic selection of the RRH with clear airtime
  

8.3 Analogy to Wi-Fi 7 MLO

802.11be MLO allows a multi-link device (AP/STA) to use multiple links (e.g., 2.4G, 5G, 6G bands or channels) under a single MAC entity. Features include:

Fi-Wi provides a similar effect at the building scale, but with important differences:

Because the RRHs are spatially distributed around rooms and hallways, Fi-Wi gains advantages that co-located antennas cannot provide:

These advantages come from intelligent packet routing and dynamic RRH selection, not from RF phase coordination or simultaneous beamforming across RRHs.

8.3.1 Fi-Wi vs Wi-Fi 7 MLO: Compliance and Control

Fi-Wi strictly adheres to local regulatory compliance. The Concentrator manages the queue and the schedule, but the RRH manages the compliance.

When the Scheduler assigns a TXOP to an RRH, it posts a descriptor. The RRH hardware then performs standard 802.11 EDCA:

  1. It senses the medium.
  2. It draws a random backoff counter.
  3. It counts down only when the medium is idle.
  4. It transmits when the counter reaches zero.

The Architectural Difference:

In MLO or Mesh: If an AP commits a packet to a radio and that radio hits congestion, the packet is trapped in the local buffer. The backoff might take 50ms. During this time, the AP's other radios (or other APs in the mesh) might be idle, but they cannot help because the packet is already "owned" by the busy MAC.

In Fi-Wi: The packet remains in the Concentrator's central memory until the last possible moment (see Appendix F). If the Concentrator sees an RRH entering deep backoff (via real-time telemetry) or reporting "Busy," it stops posting new descriptors to that RRH and steers subsequent traffic to a free RRH. The backoff engine remains local (compliance), but the queue feeding it is steered globally (performance).

This allows Fi-Wi to scale airtime domains across an entire building while preventing the multi-node contention collapse that plagues traditional Wi-Fi networks.

Figure 8-6: Per-airtime-domain queueing and scheduling in MLO versus Fi-Wi.
Wi-Fi 7 MLO: per-radio queues and MAC logic           Fi-Wi: one centralized queue per airtime-domain
================================================      ===============================================

   Airtime-domain                                    Airtime-domain
   --------------                                    --------------

   +-------------+   +-------------+                +-------------------------+
   |  Radio 1    |   |  Radio 2    |                |   Fi-Wi Concentrator    |
   | MAC engine  |   | MAC engine  |                |  (per airtime-domain)   |
   | Backoff     |   | Backoff     |                +-------------------------+
   | DMA queues  |   | DMA queues  |                |  Centralized queue      |
   +------+------+   +------+------+                |  AQM / L4S feedback     |
          |                 |                       |  Scheduler              |
          |                 |                       +-----------+-------------+
          v                 v                                   |
   Packet trapped          Packet trapped                       |
   in local queue          in local queue                       |
   during backoff          during backoff                       v

                                                     +--------+-------+    +--------+-------+
                                                     |   RRH A        |    |   RRH B        |
                                                     | RF front-end   |    | RF front-end   |
                                                     | LBT + backoff  |    | LBT + backoff  |
                                                     +--------+-------+    +--------+-------+
                                                              ^                    ^
                                                              |                    |
                                                   Scheduler posts descriptor only to
                                                   the RRH that is clear and eligible.

  

8.4 Preserving the "single bottleneck" L4S view

To keep L4S happy, Fi-Wi needs to preserve a single bottleneck queue per flow even while using multiple RRHs:

In other words:

9. Dynamic Point Selection and Intelligent Frequency Reuse

Traditional Wi-Fi deployments suffer from two fundamental problems in high-density environments: (1) clients are statically associated to a single AP based on initial connection, leading to suboptimal performance as they move, and (2) autonomous APs compete for airtime through CSMA/CA contention, creating interference. Fi-Wi inverts this paradigm through Dynamic Point Selection—continuously choosing the optimal RRH per packet—and Intelligent Frequency Reuse—leveraging spatial isolation to maximize capacity.

9.1 Dynamic Point Selection: The Core Capability

Unlike traditional Wi-Fi where clients are physically and logically tied to a single Access Point (AP), Fi-Wi treats the entire building as a single Virtual Cell. The Concentrator maintains real-time Channel State Information (CSI) from all RRHs and dynamically selects the optimal transmission point for each individual packet.

9.1.1 The Roaming Paradigm Shift: Negotiation vs. Execution

To understand the magnitude of this shift, we must compare the standard "Fast BSS Transition" (802.11r) with the Fi-Wi approach. In standard Wi-Fi, mobility is a negotiation. In Fi-Wi, it is an execution.

Step Standard Wi-Fi (802.11r / Fast Roaming) Fi-Wi (Dynamic Point Selection)
1. Trigger Client detects low RSSI and decides to scan. Concentrator detects better path via Uplink SNR.
2. Action Client tunes radio off-channel to scan for beacons (Latency spike: 50–100ms). Zero Action. Client stays on channel.
3. Handshake Client sends Auth + Re-Assoc frames. AP validates keys. None. No Over-the-Air frames.
4. Switch AP 1 tears down keys; AP 2 installs keys. Concentrator updates the DL_RRH_ID pointer in memory.
Total Time ~50ms – 150ms (Best case) < 1ms (PCIe Write)

While 802.11r is sufficient for buffered video (Netflix), it typically breaks real-time applications like Voice over Wi-Fi (VoWiFi) and VR/XR, where a 50ms gap causes audio dropouts or visual artifacts. Fi-Wi's sub-millisecond switching ensures true continuity.

9.1.2 How It Works

9.1.3 Example Scenario

Consider "Alice" on a VR headset walking down a hallway:

  1. Alice starts a session in Room 304 (near RRH-A: RSSI -40 dBm). The Concentrator routes packets via RRH-A.
  2. Alice walks toward the doorway. RRH-A degrades (-55 dBm) while the hallway unit, RRH-B, improves (-45 dBm).
  3. The Concentrator detects this crossing point in the CSI data.
  4. For the very next packet, the pointer switches to RRH-B.
  5. Result: Alice's VR stream continues without a single dropped frame or latency spike. She is unaware that the transmission point changed.

9.3 Intelligent Frequency Reuse

In traditional Wi-Fi, neighboring APs on the same channel create co-channel interference. The standard solution is to assign different channels (e.g., AP-A uses Channel 36, AP-B uses Channel 48), but this wastes spectrum. Fi-Wi enables intelligent frequency reuse—using the same channel across multiple RRHs when spatial conditions allow.

When Frequency Reuse Works

Frequency reuse is viable when clients are in spatially separated locations with significant isolation (typically >25-30 dB attenuation due to walls, floors, or distance).

Example: Adjacent Rooms

The Fi-Wi Decision:

  1. Concentrator detects >30 dB spatial isolation via CSI measurements
  2. Configures both RRH-A and RRH-B to operate on Channel 36
  3. Each RRH performs independent CSMA/CA in its local environment
  4. Cross-interference is minimal due to spatial isolation
  5. Result: Effective channel capacity is doubled without requiring additional spectrum

Dynamic Adaptation

The key advantage over static channel planning is real-time adaptation:

Why Autonomous APs Cannot Do This

Requirement Fi-Wi (C-RAN) Autonomous APs
Global CSI Visibility Complete: Concentrator sees CSI from all RRHs to all clients in real-time Fragmented: Each AP only knows its own channel. Must exchange info over backhaul (high latency)
Decision Latency Microseconds: Concentrator makes decisions in software at µs granularity Milliseconds to seconds: APs coordinate via slow management protocols
Adaptation Speed Per-packet: Can switch RRH or channel based on every CSI update Minutes: Channel changes require beacon updates, client reassociation
Client Disruption None: Decisions are transparent to clients High: Channel changes or AP reassignment cause connectivity interruptions

9.4 Transparent Integration with L4S

The complexity of dynamic point selection and frequency reuse is hidden from the L4S congestion control loop. Traffic still lives in per-airtime-domain group queues. When the Concentrator enables frequency reuse or optimizes RRH selection, it simply affects the effective service rate μ(t) of the queue.

The PI² controller in the outer loop (see Section 5) sees the queue draining faster and naturally reduces ECN marking. This allows L4S senders (TCP Prague) to ramp up their congestion windows to fill the expanded capacity. The system automatically discovers and exploits available spatial capacity without requiring changes to congestion control algorithms or application awareness.

9.5 Governing Station Media Access: The Control Hierarchy

A common critique of centralized wireless architectures is the "autonomous client problem": while the infrastructure can be coordinated, the stations (STAs) are independent entities that contend for the medium using their own logic.

Fi-Wi addresses this by enforcing a Control Hierarchy that governs client behavior from the physical layer up to the transport layer. Instead of passively hoping for "good client behavior," Fi-Wi uses four distinct mechanisms to throttle, steer, or schedule station media access.

Figure 9-3: The Four Tiers of Client Governance

Level 1: Deterministic (Hard)
   [ 802.11ax Trigger Frames ] ──▶ STA must wait for Schedule
                                    (Zero contention)

Level 2: Transport (Adaptive)
   [ L4S / ECN Marking ] ────────▶ OS Kernel throttles pacing
                                    (Reduces MAC load before enqueue)

Level 3: RF Physics (Steering)
   [ Beacon Power Shaping ] ─────▶ STA firmware seeks new cell
                                    (Moves demand to different domain)

Level 4: Statistical (Soft)
   [ WMM / AIFS Parameters ] ────▶ STA adjusts backoff aggression
                                    (Statistical deprioritization)
    

1. Deterministic Scheduling (802.11ax/be)

For modern clients (Wi-Fi 6/7), Fi-Wi removes autonomy entirely for uplink traffic. The Concentrator generates Trigger Frames via the RRH.

2. Transport-Layer Pacing (L4S)

For the growing ecosystem of L4S-capable clients (iOS, macOS, Linux, Windows), control is applied at the Operating System kernel.

3. RF Footprint Shaping (Beacon Power)

Fi-Wi manipulates the physical environment to restrict which RRHs a client perceives as viable, effectively "shoving" media access demand to specific airtime domains.

4. Statistical Parameter Biasing (WMM/AIFS)

As a defense-in-depth measure for legacy clients, Fi-Wi advertises tuned WMM EDCA parameters.

Summary: Fi-Wi does not rely on a single method to control clients. It uses Triggers for precision, L4S for flow-rate discipline, RF Shaping for load balancing, and WMM as a statistical safety net.

9.6 What Dynamic Point Selection Does NOT Enable

To maintain technical accuracy, it is important to clarify what Fi-Wi's dynamic point selection does not provide:

These capabilities would require either:

Fi-Wi's architecture deliberately focuses on capabilities achievable with COTS Wi-Fi chips, providing 2-3x capacity improvement through intelligent management rather than pursuing 4-6x gains that would require custom silicon development.

9.7 Performance Expectations

Based on the capabilities described above, Fi-Wi provides the following performance improvements over traditional autonomous AP deployments:

These gains are achieved through centralized intelligence and microsecond-latency fronthaul, not through RF phase control or coordinated transmission. The architecture remains fully compliant with unlicensed spectrum regulations and works with commodity Wi-Fi chipsets.

9.8 Summary

Fi-Wi transforms the problem of wireless density by treating it as a routing and scheduling problem rather than an RF coordination problem. By centralizing packet memory and MAC scheduling, Fi-Wi converts adjacent radios from interferers into dynamically selected access points, allowing the network to scale capacity through intelligent management rather than collapsing under interference.

The key insight is that most Wi-Fi performance problems stem from poor decisions (wrong AP, wrong channel, wrong timing) rather than fundamental RF limitations. Fi-Wi solves this by providing the Concentrator with complete visibility and control, enabling microsecond-granularity optimization that autonomous APs cannot match.


10. Fi-Wi value vs. Traditional Distributed APs

Modern enterprise Wi-Fi deployments use centralized controllers (Cisco WLC, Aruba Mobility Controller, Ubiquiti UniFi, Ruckus SmartZone, etc.) to manage multiple APs. These controllers coordinate the control plane: channel assignment, transmit power, client association hints, roaming policies, and security. However, these remain loosely-coupled systems where the data plane — queueing, MAC scheduling, aggregation, and packet memory — remains distributed inside individual APs.

A traditional AP is not just “running EDCA.” It is running EDCA after juggling dozens or hundreds of logical MAC queues and state machines:

With N stations, an AP can easily have on the order of N × (4–8) logical queues behind a single RF channel. Every AP in the same RF domain runs this large, isolated, queue-filled state machine independently. No AP has a global view; controllers see only coarse statistics.

The result:

Fi-Wi is fundamentally different: it centralizes both control plane and data plane with shared state across all RRHs. The concentrator does not just configure RRHs; it directly manages their queues, schedules their TXOPs, maintains unified CSI and airtime state, and applies coordinated ECN marking for each airtime domain. This architectural difference — not just improved control-plane coordination — is what enables Fi-Wi’s latency, L4S, and spatial multiplexing advantages.

Diagram 10-1: Queue Explosion Inside a Traditional AP

┌──────────────────────────── Traditional Distributed AP ───────────────────────────┐
│                                                                                   │
│  Many MAC queues hidden inside each AP:                                           │
│                                                                                   │
│    ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                              │
│    │ STA 1 TID   │  │ STA 2 TID   │  │ STA N TID   │   ... (N stations × 4–8 TIDs)│
│    │ Queues      │  │ Queues      │  │ Queues      │                              │
│    └─────┬───────┘  └─────┬───────┘  └─────┬───────┘                              │
│          │                │                │                                      │
│   ┌──────▼────────────────▼────────────────▼──────────┐                           │
│   │   Firmware Queues (Aggregation, Reorder, BAR/BA)  │                           │
│   └───────────┬───────────────────────────────────────┘                           │
│               │                                                                   │
│   ┌───────────▼──────────────┐                                                    │
│   │ Hardware MAC Ring Buffers│   (TX/RX DMA)                                      │
│   └───────────┬──────────────┘                                                    │
│               │                                                                   │
│   ┌───────────▼──────────────┐                                                    │
│   │ EDCA / CSMA-CA Contention│   (Per-AP, no coordination)                        │
│   └───────────┬──────────────┘                                                    │
│               │                                                                   │
│        Long, multi-ms TXOP bursts, inconsistent ECN, early collapse               │
│                                                                                   │
└───────────────────────────────────────────────────────────────────────────────────┘
  

See also: Section 2.1 — Why L4S + Legacy Wi-Fi Struggle, Appendix A — 802.11 Backoff & Collapse Dynamics.

The following subsections detail specific benefits of Fi-Wi’s cellularized, tightly-coupled architecture compared to controller-managed, loosely-coupled AP systems.

10.1 Deterministic low latency

Traditional APs:

Each AP builds its own local queues. Under load, large aggregates, retries, and hidden buffering produce multi-millisecond queueing and service delays. Tail latency is largely uncontrolled, and varies across APs sharing the same channel.

Fi-Wi (cellularized Wi-Fi, cell-per-room):

10.2 Stable L4S behavior

Traditional APs:

L4S flows traverse multiple hidden queues: wired bottlenecks, AP-local queues, firmware queues, and EDCA contention. ECN marking (if it exists at all) is inconsistent and not tied to a single bottleneck. Collapse produces noisy, bursty marking or loss, and the L4S control loop becomes oscillatory or falls back toward classic congestion behavior, especially in the tails that matter to users.

Fi-Wi:

10.3 Aggregation without losing visibility

Traditional APs:

Aggregation improves PHY efficiency but hides individual packet timing from the congestion controller. The controller does not know which MSDUs were grouped into a TXOP, what the queue state was when the TXOP started, or how long each device has been waiting.

Fi-Wi:

This combination yields high PHY efficiency and transport-layer visibility into congestion, instead of having to choose one or the other.

10.4 Building-scale coordination

Controller-managed loosely-coupled APs:

The controller can adjust channels, power, and send steering hints (e.g., 802.11v), but it cannot see or control:

As a result, these systems rely on heuristic, reactive policies: channel reassignment after interference is observed, power adjustments based on neighbor reports, and client steering using RSSI or airtime snapshots. These help, but they operate on coarse time scales (seconds to minutes) and cannot fix the fundamental data-plane issues of distributed queues, MAC contention, and tail latency under load.

Fi-Wi cellularized architecture:

The concentrator maintains true shared state across all RRHs in the building:

Because RRHs are distributed in space (often 2–4 per room in high-density deployments), Fi-Wi can leverage spatial separation for intelligent frequency reuse. The concentrator sees CSI from all RRHs and can make microsecond-granularity decisions about which RRH should transmit each packet — all while preserving the "single bottleneck queue per airtime domain" discipline required for stable L4S behavior.

Diagram 10-2: Fi-Wi Centralized Queueing, Scheduling, and Shared State

┌─────────────────────────── Fi-Wi Cellularized Architecture ────────────────────────────┐
│                                                                                        │
│     One deep queue per airtime domain                     Shared CSI + µs timestamps   │
│                                                                                        │
│          ┌───────────────────────────────────────────┐                                 │
│          │ Centralized Airtime-Domain Queue (ECN AQM)│◄──────────┐                     │
│          └───────────────────┬──────────────────────┘            │                     │
│                              │                                   │                     │
│   ┌──────────────────────────▼──────────────────────────┐        │                     │
│   │   Concentrator Scheduler (L4S, TXOP, RF Grouping)   │◄───────┘                     │
│   │        Dynamic Point Selection per Packet           │                              │
│   └───────────────┬─────────────────────────┬───────────┘                              │
│                   │                         │                                          │
│       PCIe/Fiber  │                         │   PCIe/Fiber                             │
│                   │                         │                                          │
│   ┌───────────────▼─────────────┐  ┌────────▼──────────────┐  ...                      │
│   │    RRH 1 (Thin MAC/PHY)     │  │   RRH 2 (Thin MAC/PHY)│                           │
│   └───────────────┬─────────────┘  └────────┬──────────────┘                           │
│                   │                         │                                          │
│             Selected RRH transmits; others silent in this TXOP                         │
│                                                                                        │
└────────────────────────────────────────────────────────────────────────────────────────┘
  

See also: Section 4 — Key Fi-Wi Mechanisms, Section 5 — Control Architecture, Section 9 — Dynamic Point Selection.

10.5 Control Plane vs. Data Plane

The table below summarizes the architectural differences between controller-managed, loosely-coupled APs and Fi-Wi's cellularized, tightly-coupled architecture:

Capability Controller-Managed Loosely-Coupled APs Fi-Wi Cellularized Tightly-Coupled
Control Plane
Channel assignment ✓ Centralized ✓ Centralized
Transmit power control ✓ Centralized ✓ Centralized + dynamic beacon shaping
Client steering hints ✓ Centralized (802.11v/k) ✓ Centralized
Data Plane
Packet queues ✗ Distributed per-AP; many hidden per-STA/per-TID/firmware queues ✓ Exactly one deep queue per airtime domain in the concentrator
MAC scheduling & aggregation ✗ Autonomous per-AP; long TXOPs under load ✓ Coordinated across RRH groups; TXOP length explicitly bounded
Timestamp synchronization ✗ Not available at packet level ✓ µs-accurate (PTM/PTP) shared across RRHs
Shared CSI state ✗ Per-AP only; summarized to controller ✓ Building-wide CSI aggregation at the concentrator
Queue visibility & AQM ✗ Hidden in each AP; no global AQM ✓ Fully visible per domain; explicit L4S/AQM on the true bottleneck
L4S/ECN marking point ✗ Inconsistent or absent; multiple uncontrolled bottlenecks ✓ Single, well-defined marking point per airtime domain
Dynamic point selection ✗ Clients statically associated to one AP ✓ Per-packet RRH selection based on real-time CSI (Section 9)
Selection diversity ✗ Single AP receives uplink ✓ Multiple RRHs receive; best copy selected (Section 9)
Intelligent frequency reuse ✗ Static channel plan ✓ Dynamic adaptation based on spatial isolation (Section 9)
Per-packet steering between radios ✗ Not available ✓ Active redundancy and fast failover (Section 8)
Dynamic RF grouping ✗ Static AP boundaries ✓ Adaptive airtime domains based on CSI and load (Section 6)
Key insight: controller-managed systems coordinate configuration but leave data-plane behavior distributed and autonomous. Fi-Wi unifies the data plane with shared state and explicit control of queues and TXOPs, enabling fundamentally different behavior for latency control, dynamic point selection, and building-scale coordination. All capabilities are achieved with COTS Wi-Fi chipsets and comply with unlicensed spectrum regulations.

10.6 Operational and lifecycle advantages

Controller-managed loosely-coupled APs:

Fi-Wi cellularized architecture:


11. RRH Physical Envelope: Power, Thermals, and Size

The economic viability of a "Cell-Per-Room" architecture hinges on the Remote Radio Head (RRH) being fundamentally simpler, cooler, and cheaper than a traditional Enterprise Access Point. By offloading complex logic to the Concentrator (Section 13) and precision timing to the Fronthaul (Section 4.7), the RRH becomes a lean physical device.

11.1 The Silicon Strategy: Mobile vs. Enterprise SKUs

Fi-Wi explicitly selects Mobile/Client Wi-Fi 7 chipsets (e.g., Qualcomm FastConnect or Broadcom BCM43xx client series) rather than traditional Enterprise AP/Networking SKUs. While Section 4.7 detailed how this enables external clocking, this choice is equally critical for the physical envelope:

11.2 Power Budget Composition

We set a hard budget of 3.5–4 W total per RRH, enabling Power over Ethernet (PoE) Class 1 or 2 operation, or simple remote powering over hybrid fiber/copper cables.

11.3 Thermal and Mechanical Implications

A sub-4W envelope fundamentally changes the industrial design possibilities for the RRH:

11.4 Concentrator-Side Considerations

Fi-Wi relies on a "Split Thermal" architecture. We deliberately shift the power density from the edge (the ceiling) to the core (the wiring closet).


12. PCIe Fronthaul (Gen3 x1 over Fiber)

12.1 Why PCIe as the RRH interface

A central hardware design choice is to make the RRH look like a PCIe endpoint to the Fi-Wi concentrator. This leverages the fact that:

Benefits of this choice:

We start with PCIe Gen3, one lane (x1), carried over fiber via a retimer + optical interface. Higher generations or widths (Gen4, x2/x4) are possible later but not required for the initial Fi-Wi performance targets.

12.2 Gen3 x1 throughput

PCIe Gen3 provides:

After protocol overhead (TLP headers, DLLPs, flow control), the sustained payload throughput for Gen3 x1 is in the rough range of 6–7 Gb/s for large transfers. This is more than sufficient for:

If a future RRH design must exceed this, the same architecture scales to:

For our initial Fi-Wi deployment assumptions, Gen3 x1 over fiber is a sensible and sufficient starting point.

12.3 Latency characteristics and budget

PCIe Gen3 latency has several components:

Order-of-magnitude:

Compared to:

the PCIe-over-fiber latency is effectively negligible. It comfortably fits within the microsecond-level time base used for:

12.4 Mapping queues and metadata

The PCIe model fits naturally with the Fi-Wi queueing and metadata scheme. Each RRH behaves like a PCIe endpoint with:

The FiWiMeta header lives in host memory adjacent to packet payloads and is referenced by these descriptors.

Downlink flow:

  1. Concentrator enqueues IP/Ethernet packets into a group queue in DRAM, allocates or updates FiWiMeta (including t_ingress_us and queue snapshot).
  2. Scheduler posts PCIe descriptors to the RRH for the next TXOP, selecting which MSDUs and which RF group/airtime domain.
  3. RRH DMA-fetches the MSDUs via Gen3 x1, builds an aggregate (A-MPDU), transmits over the air, and reports:

Uplink flow:

  1. RRH receives 802.11 frames from STAs, decodes them, and attaches CSI and MAC status.
  2. RRH DMA-writes the frames + metadata into concentrator DRAM via PCIe.
  3. Concentrator:

In both directions, the PCIe fronthaul:

12.5 PCIe Hot Swap

A critical operational requirement for Fi-Wi is the ability to service, replace, or add RRHs without bringing down the entire building's wireless network. PCIe provides native support for this through hot-plug capability, which is standard in enterprise server platforms and can be leveraged for Fi-Wi deployments.

12.5.1 Hot-plug fundamentals

PCIe hot-plug allows physical insertion and removal of endpoint devices (RRHs) while the system is running:

12.5.2 RRH insertion flow

When a new RRH is connected or powered on:

  1. Physical detection: PCIe hot-plug controller detects the new device via link training.
  2. Enumeration: Concentrator OS (Linux) enumerates the new PCIe endpoint:
  3. Driver initialization: Fi-Wi driver:
  4. RF group integration: Concentrator control plane:

Time from physical insertion to active traffic forwarding: typically 1–5 seconds, depending on link training, driver initialization, and RF group discovery.

12.5.3 RRH removal flow

When an RRH is removed (planned maintenance, failure, or surprise disconnection):

  1. Detection: PCIe hot-plug event or surprise removal detected:
  2. Traffic rerouting: Concentrator immediately:
  3. Queue cleanup: Driver:
  4. RF group adjustment: Control plane:

Impact on active connections: minimal to none for STAs served by multi-RRH domains. Traffic seamlessly fails over to remaining RRHs within the same RF group. For isolated single-RRH cells, removal causes brief disconnection until STAs reassociate with neighboring cells.

12.5.4 Operational advantages

Hot-plug capability provides critical operational benefits:

12.5.5 Design considerations

To fully support hot-swap in production deployments:

12.5.6 Contrast with traditional APs

Traditional distributed APs handle failures differently:

Fi-Wi's PCIe hot-plug, combined with multi-RRH airtime domains and centralized queues, enables sub-second failover with minimal packet loss—a qualitative improvement over traditional Wi-Fi high-availability approaches.

12.5.7 Integration with L4S and queue management

Hot-swap events interact cleanly with Fi-Wi's L4S and queueing architecture:

This separation—queues and control in the concentrator, timing-critical MAC in hot-swappable RRHs—is precisely what enables graceful hardware lifecycle management while maintaining the control-theoretic cleanliness that L4S requires (Appendix A).


13. Hardware Architecture: The Workstation Concentrator vs. The Legacy AP

To understand why Fi-Wi achieves deterministic latency where traditional Wi-Fi fails, we must look beyond the protocol and into the physical architecture of the devices. The feasibility of the "Cut-Through" RRH design relies on the upstream link being non-blocking. Fi-Wi achieves this by replacing the internal switching fabric of legacy APs with the massive PCIe lane overprovisioning of a workstation-class Concentrator.

13.1 The Legacy Bottleneck: Anatomy of a Traditional AP

Component Traditional AP (The Appliance) Fi-Wi RRH (The Peripheral)
Core Silicon Complex SoC (Quad-core CPU, NPU, Switch) Thin PHY/MAC + PCIe Retimer
Data Path Store-and-Forward (Switch → CPU → DMA) Cut-Through (Fiber → PCIe → Air)
Queues 1000s of opaque hardware queues Zero deep queues (FIFO only)
Decision Making Autonomous (Local Scheduler) None (Slave to Concentrator)

A traditional Enterprise Access Point is functionally a "Router-on-a-Stick." It forces high-speed wireless traffic through a series of internal serialization bottlenecks before the software ever sees the packet.

TRADITIONAL AP ARCHITECTURE (The Traffic Jam) [ Cat6 Cable ] | +----------v-----------+ | RJ45 Magnetics | +----------+-----------+ | +----------v-----------+ | Ethernet Switch | <--- Queuing Point A: Switch Buffer | (or PHY) | (Head-of-Line Blocking / Opaque) +----------+-----------+ | | GMII / RGMII / SGMII Link | (Fixed 1G or 2.5G Pipe) | +----------v-----------+ | AP SoC | | | | [ CPU / OS ] | <--- Queuing Point B: Kernel/Driver | | | (Software Bridging Latency) | v | | [ HW DMA Rings ] | <--- Queuing Point C: Hardware Queues | (Per Station/AC) | (The "Blind" Enqueue Point) | | | | [ Wi-Fi MAC/BB ] | +--------+-------------+ | [ Radios ]

Architectural Flaws in Legacy APs:

  1. The GMII Choke: The interface between the internal Switch and the CPU is a serialized bottleneck (typically GMII/SGMII). High-speed bursts from Wi-Fi 6E/7 radios can saturate this single link, causing invisible backpressure inside the SoC.
  2. Triple Buffering: A packet is buffered at the Switch (Point A), then in system RAM (Point B), and finally in the Hardware DMA Ring (Point C). This "Store-and-Forward" chain destroys the precise timing required for L4S.
  3. Opaque Switching: The internal switch operates autonomously. The CPU has no visibility into the depth of the switch's internal buffers, meaning latency accumulates invisibly before the OS can measure it.

13.2 The Fi-Wi Solution: The 92-Lane Fabric

Fi-Wi eliminates the internal switch, the GMII link, and the autonomous CPU. By utilizing high-end workstation silicon (e.g., AMD Threadripper Pro or Intel Xeon W-3400 series), the Concentrator provides 92 to 128 native PCIe lanes directly from a CPU with 24 to 96 high-performance cores.

The 92+ lanes of PCIe eliminate the need for an internal ethernet switch anywhere in the datapath.

TOPOLOGY COMPARISON Standard Server + Switch Fi-Wi Workstation Concentrator ┌─────────────┐ ┌──────────────────────────┐ │ Dual CPU │ 20 Lanes │ Workstation CPU │ │ (High Core) │ per CPU │ (24-96 Cores, High Freq) │ └──────┬──────┘ └────────────┬─────────────┘ │ ||||||||||||||| (92 Native Lanes) ┌──────▼──────┐ ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓ │ PCIe Switch │ (Congestion Point) RRH RRH RRH RRH (Direct Attach) └─┬─┬─┬─┬─┬─┬─┘ ... ... ... ... ↓ ↓ ↓ ↓ ↓ ↓ RRH Connections

13.3 Dedicated Resources and Determinism

By mapping each RRH (or small groups of RRHs) to dedicated root ports on the CPU, Fi-Wi achieves a Non-Blocking Architecture:

This guarantees that the host DRAM behaves like Deterministic Ultra-Low Latency Memory rather than a shared network resource. This stability is the physical foundation that allows the software-defined queues (Section 14) to operate with microsecond precision.

Historical Analogy: How the Cisco 7500 Removed the "Global Lock"

Just as Fi-Wi removes blocking via massive PCIe lane availability, the CyBus ASIC in the Cisco 7500 (1990s) solved a similar bottleneck in routing.

Fi-Wi applies this same "Non-Blocking" philosophy to the wireless stack, utilizing 92+ lanes of PCIe to ensure that RRH memory access is never gated by a shared internal switch or software mutex.

14. Hardware Queues and the Software Advantage

14.1 The Hardware Queue Problem

Traditional Wi-Fi APs use hardware DMA (Direct Memory Access) rings to meet strict 802.11 MAC timing requirements—SIFS and DIFS deadlines measured in microseconds. While this solves the timing problem, it creates a cascade of architectural constraints that Fi-Wi explicitly avoids.

Hardware queues are expensive to implement in silicon. Each queue requires dedicated SRAM for descriptor storage, control logic for pointer management and overflow handling, and power even when idle. Current chip design limits traditional APs to hardware queues at L2 or MAC—typically the four WMM access categories (AC_VO, AC_VI, AC_BE, AC_BK) per radio * N stations.

While sufficient for basic priority handling, this fundamental constraint prevents the sophisticated per-flow scheduling that modern high-density networks require:

What AP hardware queues prevent: ✗ Per-flow fair queuing (would require 100+ queues) ✗ DualQ L4S per flow ✗ Dynamic queue allocation based on traffic patterns

14.2 The DMA Ownership Constraint

An equally significant problem is that once packets are enqueued to hardware DMA rings, the CPU cannot access them without causing race conditions. This "ownership transfer" creates fundamental limitations:

Critical constraint: All packet inspection, classification, ECN marking, and policy decisions must occur before handing packets to hardware. After DMA enqueue, software is blind until transmission completes.

This prevents:

14.3 Compensating Hardware

Because hardware queues are limited and packets become inaccessible after DMA, traditional AP vendors must add compensating hardware functionality to address these fundamental architectural limitations:

Fundamental Limitation Hardware Workaround Required Complexity Added
Only 4-8 queues → no per-flow fairness Airtime fairness tracking engine Significant additional logic
Only 4-8 queues → no per-STA queuing MU-MIMO grouping and coordination Complex scheduling algorithms
Can't inspect after enqueue Hardware deep packet inspection engine Pattern matching, state tracking
Can't mark ECN in real-time Hardware ECN marker with threshold logic Queue monitoring, marking logic
Can't reclassify flows dynamically Flow classification accelerator (TCAM) Fixed rules; high-priority only; cannot update easily

This compensating hardware represents substantial additional silicon area, design complexity, and verification effort. More critically, hardware-based solutions are fundamentally limited to fixed thresholds and simple policies that were designed into the chip. They cannot implement sophisticated algorithms like CoDel, PIE, or adaptive per-flow policies that require complex state and frequent updates.

14.4 Fi-Wi's Architectural Solution

Fi-Wi escapes these constraints through architectural separation:

RRH: Timing without queuing

RRH silicon implements only timing-critical functions (MAC/PHY, synchronization) with zero hardware queues. Packets arrive from the concentrator milliseconds before transmission, stay in simple descriptor rings briefly, then transmit. No autonomous queuing or scheduling logic.

Concentrator: Unlimited software queues

All queues live in concentrator DRAM. Because the concentrator operates at TXOP granularity (~600 µs) rather than SIFS granularity (16 µs), it has time for software scheduling. Queue structures are simple data structures in memory— vastly cheaper than dedicated silicon:

Concentrator per RF group: - 1000+ per-flow queues implemented as hash tables in DRAM - Each queue is a simple software structure (linked list or array) - Memory cost is negligible compared to 8+ GB server DRAM in concentrator - No power consumption when idle - Can be allocated/deallocated dynamically as needed Enables what traditional APs cannot do: ✓ Per-flow fair queuing (stochastic fairness) ✓ DualQ L4S with separate queues per flow class ✓ Real-time ECN marking (actual sojourn time at TX) ✓ Sophisticated AQM (CoDel, PIE, custom algorithms) ✓ Deep packet inspection any time before TX ✓ Dynamic flow reclassification ✓ Full queue visibility for debugging

Packet ownership until last moment

The critical difference: packets remain in concentrator DRAM (software-accessible) until milliseconds before transmission. The scheduler can:

RRH only owns packets for ~1 ms while transmitting a TXOP—too brief to constrain the system.

14.5 Economic and Strategic Impact

Aspect Traditional AP Fi-Wi
Queue count N stations * 4-8 (at MAC or L2 level) 1000+ (dynamically allocated, quintuple level)
Queue implementation Dedicated silicon (expensive) Software data structures (negligible cost)
Compensating logic Substantial silicon for workarounds None needed
Per-flow fairness Impossible (insufficient queues) Standard capability
Sophisticated AQM Simple thresholds only (hardware fixed) Any algorithm (CoDel, PIE, ML-based)
Policy updates Requires new silicon design Software configuration or code update
Operational visibility Aggregate counters only Full per-flow statistics and queue contents
Algorithm experimentation Impossible in production A/B testing, gradual rollout possible

Beyond the direct silicon cost advantages, Fi-Wi gains strategic advantages that compound over time:

14.6 Architectural Principle

Fi-Wi's approach follows a clear design principle:

RRH (hardware): Only latency-critical functions requiring microsecond determinism (MAC timing, PHY processing, synchronization).

Concentrator (software): All scheduling, queuing, inspection, marking, policy, and adaptation—anything that benefits from flexibility, visibility, or frequent updates.

This separation is not arbitrary. It's driven by fundamental constraints: hardware is expensive, inflexible, and opaque; software is cheap, updatable, and inspectable. By placing intelligence in software and only timing-critical functions in hardware, Fi-Wi achieves both the performance of hardware-accelerated systems and the flexibility of software-defined networking—advantages that traditional distributed-AP architectures cannot replicate due to their need for autonomous per-AP decision-making at microsecond timescales.

15. Adaptive Control via Machine Learning

The Fi-Wi architecture's centralized observability enables machine learning to optimize MCS transition dynamics on a per-site basis. Unlike autonomous APs that operate on partial, local state, the Concentrator observes the complete state-transition graph for all RRHs under a single clock. This section describes how Fi-Wi combines physics-based models with adaptive learning to optimize performance.

15.1 The MCS State Graph as a Probability Current Network

The MCS state graph from Section 2.7 can be formalized as a probability current network, where each node represents a PHY configuration state (MCS index, spatial stream count) and edges represent transitions between states. The system's behavior follows probability flow dynamics:

Figure 15-1: Interactive Animation: MCS and Spatial Stream Performance (with Eigen Space)

Interactive Animation: MCS and Spatial Stream Performance (with Eigen Space)

Autonomous AP(s)

PER: 0.0%
Eigen Vectors: 2
WLAN Util: 0.0% (0 Mbps)
P99.9 Latency: 0 ms
(802.11: 0 ms + 802.3: 0 ms)

Centralized Concentrator

PER: 0.0%
Eigen Vectors: 16
WLAN Util: 0.0% (0 Mbps)
P99.9 Latency: 0 ms
(802.11: 0 ms + 802.3: 0 ms)

Flow Field Visualization

Autonomous AP(s) - Flow Field

Turbulent Flow (High Entropy)

Centralized Concentrator - Flow Field

Laminar Flow (Low Entropy)

Probability Current (J) - Flow Field Visualization

What you're seeing: The vector field (arrows) shows the "flow" of PPDUs through the MCS/Spatial Stream space—the "river" of probability current that drives system behavior.

Autonomous AP (Left): Turbulent flow with chaotic arrow directions, sometimes pointing backward when collisions occur. Multiple shallow potential wells create competing forces. This represents High Entropy—the system doesn't know which way is optimal.

Centralized Concentrator (Right): Laminar flow with smooth, coherent streamlines pointing toward the optimum. Steeper gradients and deeper potential wells create strong convergence. This represents Low Entropy (Determinism)—the system has clear direction toward the optimal state.

When enabled, only one device is visualized. All devices still run in the background to drive system dynamics, but you can see the turbulence affecting a single device more clearly.
L4S ON: Optimizes both PHY rates and latency (conservative, stable MCS)
Note: L4S ECN signaling only works with Centralized Concentrator architecture.
Autonomous APs can't coordinate aggregate WAN state, so queue delay reduction doesn't apply.
L4S OFF (Greedy): Maximizes PHY rates (aggressive, higher MCS targets)
Unchecked (Phase 1): Software MAC Coordination only. Eliminates collisions, but Eigenvectors capped at 4 (Hardware Limit).
Checked (Phase 2): FPGA-based Coherency. Unlocks Distributed MIMO (Rank Expansion). Eigenvectors scale to 16+.
15 devices
Autonomous AP(s):
1 domain
Fi-Wi RRH per room:
400 sq ft/room (25 domains)
10,000 sq ft (1.5 devices per 1,000 sq ft)
Technical Justification for FPGA (Phase 2)
Phase 1: Coordinated Scheduling (Software/MAC)

In Phase 1, the Central Concentrator uses standard MAC-level timing to prevent APs from transmitting simultaneously on the same frequency.

Result: This successfully eliminates the "Red" (collisions) seen in the Autonomous model. However, because the Radio Heads (RRHs) are not phase-aligned, they cannot perform Joint Transmission. The channel rank is limited to the physical antennas of a single RRH (Rank 4). Throughput hits a "Glass Ceiling."

Phase 2: Distributed MIMO (FPGA/PHY)

In Phase 2, we introduce an FPGA to achieve sub-nanosecond synchronization between RRHs. This allows multiple RRHs to act as a single, distributed antenna array.

Result: This unlocks Rank Expansion. The system can resolve 16+ spatial streams (Eigenvectors) simultaneously. The "Glass Ceiling" is removed, and throughput scales linearly with the number of RRHs deployed.

Implementation Mechanism: To achieve <1ns precision over fiber, the system utilizes the White Rabbit (IEEE 1588 HA) protocol. An FPGA on the RRH compensates for variable PCIe bus latency (using PCIe PTM) and fiber propagation delay, ensuring the RRH clock is phase-locked to the Central Concentrator.

15.2 What Gets Learned: The Transition Rate Matrix

Machine learning in Fi-Wi optimizes the transition rate matrix W based on telemetry that is only observable in a centralized architecture. For each potential transition from state i (MCSi, SSi) to state j (MCSj, SSj), the learned rate depends on:

Per-Transition Learning Inputs:

The learned transition rate function takes the form:

W[i→j] = f(CSI, PER, queue_depth, interference, density, time, site_params)

This learned function answers: "Given the current state and observed conditions, what is the optimal next MCS/SS configuration to meet the L4S latency target while maximizing achievable throughput?"

Slow Learning, Fast Execution:

The ML engine operates on the control plane timescale with adaptive update rates: milliseconds for sudden events (interference spike detection requiring rapid response), seconds for typical rate adaptation (matching the timescales demonstrated by minstrel/minstrel_ht schedulers), and minutes for long-term pattern learning (daily traffic patterns, where slower updates are sufficient). This decouples the computational cost of learning from the latency constraints of packet transmission. The scheduler does not run neural network inference per packet—it uses a pre-computed policy matrix updated at rates appropriate to the dynamics being observed.

15.3 Physics-Informed Learning

Fi-Wi uses physics-informed machine learning that combines Shannon capacity theory with learned corrections. This hybrid approach provides explainability, sample efficiency, and principled generalization.

The transition rate decomposes into two components:

W[i→j] = Wphysics(SNR, BW) · Wlearned(site, time, load) ↑ ↑ Shannon-theoretic Site-specific baseline corrections

Wphysics: The physics baseline uses Shannon capacity to establish theoretical bounds. For each MCS index, the required SNR is known from 802.11 specifications (e.g., MCS 11 requires ~30 dB). The base transition rate is the probability that current SNR exceeds the threshold given measured CSI.

Wlearned: The learned correction factor captures deviations from ideal conditions on a per-station basis, as different spatial stream capabilities and local RF environments require station-specific adaptation:

This approach uses residual learning: the physics model Wphysics provides the coarse steering (the "prior"), while the ML model learns the residual error Δ specific to the site. This guarantees the system never performs worse than a standard physics-based model, even before site-specific training converges. The ML correction is additive (or multiplicative) to a known-good baseline.

This decomposition provides three advantages:

  1. Explainability: When Wlearned deviates significantly from 1.0, the system can flag anomalies and explain why performance differs from theory.
  2. Sample Efficiency: The physics prior means the ML model only needs to learn corrections rather than the full mapping from scratch.
  3. Generalization: The base model Wphysics is universal. Site-specific Wlearned factors can be initialized from similar deployments and fine-tuned with site-specific data.

15.4 Training Data from Centralized Observability

The Concentrator's complete state visibility provides labeled training examples that are impossible to obtain in distributed AP systems. Each scheduling decision creates a training tuple:

Training Example Structure:
Statet: • MCS = 9, SS = 2 (current PHY configuration) • Queue depth = 50 packets • Sojourn time = 800 µs • CSI = [λ₁=0.92, λ₂=0.58, κ=8.2 dB] (from RRH-A) • PERrecent = 0.02 (last 100 packets) • Client density = 12 stations • Interference = -75 dBm Action: • Transition to MCS = 7, SS = 2 Outcomet+1: • PER = 0.01 (improved) • Throughput = 380 Mbps • Latency = 450 µs (met L4S target) • Queue drain rate = increased Label: ✓ GOOD TRANSITION

Over time, the Concentrator accumulates thousands of these labeled examples across varying conditions. The ML model learns patterns such as:

This supervised learning is only possible with centralized observability. As detailed in Appendix H, autonomous APs lack:

It's worth noting that supervised learning doesn't require perfect ground truth labels to be effective—even relative quality assessments ("better" vs "worse") can drive learning. However, Fi-Wi's complete observability provides significantly richer training signals: precise measurements of queue impact, throughput changes, and latency effects that enable more efficient learning compared to the partial observability available to autonomous systems.

15.5 Transfer Learning Across Sites

Fi-Wi's ML strategy uses transfer learning to balance generalization across sites with site-specific optimization:

Base Model (Cross-Site Training):

A foundational model is trained across multiple deployment sites to learn universal patterns:

Wbase[i→j] = funiversal(CSI, PER, queue_depth, density) Learns: General relationships between SNR, MCS, PER, and density

Site-Specific Adaptation:

When deployed to a new site, the base model is augmented with learned corrections:

Wsite[i→j] = Wbase[i→j] + Δbuilding + Δtemporal Δbuilding: Building-specific RF corrections • Material attenuation (concrete vs drywall) • Room geometry (open-plan vs cubicles) • Persistent interference sources Δtemporal: Time-varying patterns • Rush hour density • Weekend vs weekday usage • Seasonal variations

Continuous Adaptation:

The system continues to adapt using online learning with safety constraints:

15.6 The Learning Feedback Loop

Fi-Wi's ML capability creates a feedback loop that improves system performance over time:

1. Centralized Observability → Complete visibility of state, actions, outcomes 2. Supervised Learning → Labeled examples: (State, Action) → Outcome quality 3. Improved Transition Rates → Wlearned optimizes MCS selection per-site 4. Better User Experience → Higher throughput, lower latency, fewer errors 5. More Training Data → New conditions explored → model improves [Cycle repeats continuously]

This loop is unique to centralized architectures. Autonomous APs cannot generate ground truth labels without queue observability. Coordinated AP systems (where APs share summaries via a controller) see effects (latency, ECN) but not causes (queue growth, retry timing, aggregation depth) due to high inference distance.

Fi-Wi's centralized state graph provides the causal observability that machine learning requires. The probability current framework gives this learning a rigorous mathematical foundation: we are learning the transition rate matrix of a physical system governed by conservation laws.

Summary: Centralization Enables Learning

Machine learning requires complete, structured training examples where actions, states, and outcomes are observable under consistent measurement. Fi-Wi's centralized architecture provides this by design: all state transitions occur under a single clock, all queue dynamics are visible, and all RF outcomes are measurable. This makes the MCS probability current learnable—something that is architecturally impossible in distributed, autonomous systems.

15.7 The Multi-RRH Advantage: Learning the Spatial Network

The presence of multiple concurrent Radio Heads (RRHs) serves as the primary multiplier for the Fi-Wi machine learning capability. It transforms the learning problem from optimizing a single isolated link into optimizing a spatially coupled network. While a traditional AP optimizes a local objective function (its own throughput), the Fi-Wi Concentrator utilizes concurrent RRHs to construct a global view of the RF environment.

This multi-RRH architecture impacts the learning model in three critical ways:

1. Global RF State Visibility ("The Super-Eye")

In traditional systems, an AP is blind to the interference seen by its neighbors. In Fi-Wi, the Concentrator aggregates real-time telemetry from all RRHs simultaneously.

This creates a Global RF State Matrix composed of:

This state matrix is sparse, time-aliased, and derived from standards-compliant telemetry rather than continuous per-packet baseband capture.

The model learns not just that "Client A has a weak signal," but specifically that "Client A is weak on RRH 1, strong on RRH 2, and creates -80 dBm interference on RRH 3." This global observability enables the prediction of building-wide interference patterns invisible to single-cell learners.

2. Expanded Action Space (Selection & Redundancy)

Because Fi-Wi treats multiple RRHs as an active redundant set, the ML engine has a broader action space than a standard rate-control algorithm. It learns not only how to transmit (MCS and scheduling decisions) but which RRHs are eligible transmitters for a given packet.

3. Phase 2 Capability: Eigenstructure & Rank Expansion

Note: This capability requires the hardware-synchronized FPGA architecture (Phase 2).

With sub-nanosecond synchronization, the ML engine will be able to resolve the true distributed Eigenstructure of the environment—the "shape" of available RF paths across distributed radios. This allows for Rank Expansion, where the system resolves more spatial streams (Eigenvectors) than a single physical AP could support, scaling capacity approximately with the number of RRHs, subject to channel rank and geometry.

15.8 Operational Calibration: Zero-Occupancy Sounding

To ensure the physics-informed model converges accurately, Fi-Wi employs a specific operational strategy: Zero-Occupancy Sounding.

As described in Section 15.5, the site-specific transfer function is composed of static building characteristics (Hstatic) and dynamic temporal variations (Δtemporal). To disentangle these variables, the system schedules automated channel sounding during hours of minimum occupancy.

The "Tare" Operation:

In metrology, "tare" refers to zeroing a scale by removing known weights to isolate what you want to measure. Similarly, Fi-Wi "tares" the RF environment by measuring when human activity (the known variable) is absent.

Hmeasured(empty) ≈ Hstatic + Δbuilding

By sounding when the building is empty, the system effectively removes the noise of human movement and dynamic scatterers. This allows the Concentrator to:

  1. Isolate Hstatic: Establish a high-fidelity ground truth of the static RF environment (walls, glass, steel).
  2. Calibrate the Physics Prior: Fine-tune the Shannon capacity baseline (CShannon) against the specific physical constraints of the deployment.

This establishes a stable baseline "Zero State" for the learning model, ensuring that subsequent online learning is optimizing for dynamic changes rather than relearning the static environment. This separation dramatically improves offline RL dataset conditioning by preventing the model from relearning static structure while adapting to temporal dynamics.

15.9 Bounded Model Validation During Idle Periods

While the primary learning mode is offline (using historical data), the centralized Concentrator architecture enables a hybrid approach: opportunistic, bounded model validation during predicted idle periods.

Idle Period Detection

Because the Concentrator has global visibility of queue states across all RRHs in an Airtime Domain, it can predict when the RF channel will be underutilized—a capability fundamentally unavailable to autonomous APs that see only their local queues.

Safe Validation Protocol

During high-confidence idle predictions, the system can perform controlled validation and calibration—not arbitrary exploration:

These activities refine the offline model without introducing risk to production traffic.

Production Traffic Protection

Validation is strictly bounded to prevent interference with real traffic:

This hybrid approach provides the safety of offline learning with the adaptability of continuous refinement, exploiting natural traffic lulls that autonomous APs cannot collectively identify.

15.10 Architectural Comparison: Why Autonomous APs Cannot Learn

Machine learning for MCS optimization is fundamentally enabled by Fi-Wi's centralized architecture and impossible in distributed AP systems:

Requirement for ML Autonomous AP Fi-Wi Concentrator
Global CSI visibility ❌ Each AP sees only local channel; no cross-AP interference data ✅ Concentrator receives CSI from all RRHs; computes spatial correlation matrix
Cross-AP coordination state ❌ Cannot observe other APs' band selection, power levels, or scheduling decisions ✅ Centralized scheduler has complete visibility of all RRH configurations and decisions
Queue observability ❌ Queue depth hidden in firmware; sojourn time not exposed ✅ Centralized queuing with microsecond-resolution timestamps
Deterministic replay ❌ Cannot reproduce exact RF conditions; firmware decisions opaque ✅ Complete event log enables replay of scheduling decisions and outcomes
Inference distance ❌ High (5-10 steps from cause to transport-layer effect) ✅ Low (1-2 steps; queue → schedule → TX outcome directly linked)

This observability gap is not a vendor implementation issue—it is an architectural limitation. Autonomous APs cannot generate high-quality training labels without queue observability.

16. Concentrator Fast Path: DPDK, DMA, and Queue Determinism

The preceding sections established the architecture of the Fi-Wi concentrator: centralized packet memory (Section 4.4), group queues as the sole AQM bottleneck (Section 4.3), microsecond timestamps written into the Fi-Wi shim header (Section 4.2), and ML-driven MCS selection running continuously against that centralized data (Section 15). This section explains how the concentrator executes that pipeline with the determinism the architecture requires — maintaining a single observable bottleneck per airtime domain, applying ECN marks at the right moment, and keeping the RRH free of scheduling logic.

16.1 Why a Kernel-Bypass Data Plane

The Fi-Wi concentrator's latency and determinism targets strongly favor a kernel-bypass data plane. A conventional interrupt-driven kernel path would reintroduce jitter at exactly the point where the architecture is trying to remove it.

L4S requires ECN marks to be applied at the group queue on the same time scale as a single 802.11 TXOP. The Linux kernel's softirq-based packet path introduces interrupt coalescing and scheduler contention that accumulates across bursts. More fundamentally: every packet that transits the kernel stack competes with arbitrary OS activity for CPU time. The queue depth is not directly visible to userspace without a syscall; the marking decision cannot be co-located with the queue measurement in the same cache line.

Fi-Wi's concentrator data plane therefore runs via DPDK (Data Plane Development Kit): tight busy-poll loops on dedicated cores, with no interrupt-driven jitter. All packet operations — receive, classify, AQM mark, forward — execute in a cache-resident loop that preserves the single-bottleneck, fully-observable queue structure that the rest of the architecture depends on.

16.2 The Memory Model: IOMMU, VFIO, and Hugepages

DPDK allocates all packet buffers (mbufs) from hugepages, eliminating TLB misses during packet processing. Each airtime domain's group queue is a logically contiguous region within this space. The pool is allocated once at startup; no per-packet memory allocation occurs on the fast path.

Each SFP+ NIC is bound to the vfio-pci driver. The system IOMMU enforces DMA isolation: a card can only reach the memory regions explicitly registered with it at startup. This gives the concentrator two properties simultaneously:

Startup (once): rte_pktmbuf_pool_create() └─ VFIO registers hugepages with IOMMU └─ NIC DMA engine can now reach mbuf pool directly Per-burst (dedicated lcore, busy-poll): rte_eth_rx_burst(rrh_port, queue, pkts[], N) ← NIC DMA → mbuf, no interrupt └─ classify_airtime_domain(pkt) ← (port, queue_id) → group queue index └─ aqm_mark_l4s(pkt, queue_depth) ← ECN CE if sojourn > threshold └─ rte_eth_tx_burst(out_port, ...) ← mbuf → NIC DMA, zero copy
Figure 16-1: Concentrator polling loop. No interrupts, no kernel crossings, no per-packet allocation after startup. Queue depth and sojourn time are visible in the same execution context as the ECN marking decision.

16.3 Airtime Domains as Hardware Queue Partitions

DPDK exposes each NIC's hardware receive queues independently. Fi-Wi uses this to achieve a direct, lockless mapping from PCIe port and queue index to airtime domain — the same logical grouping described in Section 6. Each lcore owns a fixed set of (port, queue) pairs. Because ownership is exclusive, there are no locks on the fast path and no shared state between lcores during steady-state forwarding.

Fast-Path Property Kernel Stack Fi-Wi DPDK Pipeline
Receive and Queue Observability
Interrupt model Hardware IRQ → softirq → NAPI poll; coalescing adds jitter No interrupts. Dedicated lcore polls hardware queue register directly.
Queue depth visibility Visible inside kernel only; userspace access requires syscall Directly readable by AQM loop in same CPU cache line as packet pointer
Buffer allocation Per-packet skb allocation from kernel slab Pre-allocated mbuf pool; zero allocation on fast path
AQM and Forwarding
ECN marking timing Marked in kernel qdisc; subject to scheduling lag Marked in polling loop body; co-located with queue measurement
Forwarding lookup Routing table + netfilter traversal (port, queue_id) → group queue index; O(1), cache-hot
Packet copy Typically 1–2 copies through socket buffer chain Zero copies; mbuf pointer passed through the pipeline
Transmit
IOMMU interaction Kernel maps and unmaps DMA regions per packet IOMMU mapping established once at pool creation; static thereafter

16.4 The L4S Marking Loop

The AQM marking step is deliberately minimal. The DPDK data plane does not run a full queue scheduler — that is the outer control loop's responsibility (Section 5). The inner loop does one thing: read sojourn time from the shim header (Section 4.2) and set the ECN CE codepoint if the threshold is exceeded.

// Per-packet in the rx → tx burst loop:
uint64_t sojourn_ns = now_tsc() - pkt->t_ingress;
if (sojourn_ns > THRESHOLD_NS) {
    rte_ipv4_l4s_mark(pkt);                       // in-place, no copy
    fiwi_meta(pkt)->ecn_flags |= ECN_CE_APPLIED;
}
rte_eth_tx_burst(out_port, queue_id, &pkt, 1);

Because t_ingress is written by the same lcore at enqueue, no cross-core communication is needed to compute sojourn time at dequeue. The marking decision is local to the polling thread. This is what Section 4.3 means when it says AQM runs "exactly where the integrator lives": the integrator is the group queue, the group queue is an mbuf ring in hugepage memory, and the marking loop touches that ring on every poll cadence with no additional indirection.

16.5 Fault Isolation via IOMMU Groups

In a multi-card concentrator, each SFP+ card appears in its own IOMMU group, which means each card can be bound to VFIO independently and the IOMMU enforces that one card's DMA cannot reach another card's memory regions. In a deployment with multiple SFP+ cards, the IOMMU topology provides natural fault isolation at the card boundary: a PCIe error or runaway DMA event from one RRH is contained within its card's group and cannot corrupt the packet memory of an adjacent airtime domain. This is a hardware guarantee, not a software policy.

16.6 What DPDK Does and Does Not Solve

The kernel-bypass data plane is not a complexity cost — it is the mechanism that justifies the RRH's simplicity. Because the concentrator runs a deterministic, observable pipeline that applies AQM, tracks sojourn time, and manages all descriptor posting without OS intervention, the RRH never needs to make a queuing or scheduling decision. It remains a pure DMA client, exactly as the silicon cost argument in Section 4.4 requires.

Incumbent distributed APs have no equivalent. Because each AP operates autonomously, it must run its own Linux network stack, its own qdisc, and its own firmware scheduler. The CPU carrying that stack is the dominant gate cost per RRH (Section 4.4, silicon cost table). A centralized DPDK pipeline eliminates that requirement across every RRH simultaneously — not by optimizing the AP implementation, but by removing the architectural condition that forces the CPU to exist there in the first place.

That said, DPDK solves a specific problem: it gives the concentrator a deterministic, observable, zero-copy execution path in which queue state, ECN marking, and packet steering remain under unified software control. It does not solve the radio-side interface. Per-packet MCS selection, EDCA parameter control, and TX-outcome metadata from the Wi-Fi silicon remain the next required interface boundary — the point at which concentrator intelligence must reach into the RRH to close the control loop. DPDK is the precondition; radio-side per-packet programmability is what completes it.

16.7 DualPI2 Baseline: Control Law and Queue Structure

Section 16.4 described the minimal ECN marking step — reading queue state and applying a CE mark in the fast path. That sketch is sufficient to illustrate where marking occurs, but it elides the control structure that makes L4S coexistence with legacy traffic work: the dual-queue coupled AQM defined in RFC 9332.

This section defines the baseline DualPI2 control law as it would be realized inside the DPDK polling loop. Fi-Wi preserves this dual-queue topology, coupling mechanism, and PI-based control structure, but Section 17 replaces the underlying congestion signal with Airtime Debt (Di), grounding the controller in predicted wireless service time rather than raw queue occupancy.

16.7.1 The Two Queues

Each airtime domain maintains two logically independent mbuf rings in the concentrator's hugepage pool: an L4S queue for scalable congestion-control flows (senders marking with ECT(1)), and a Classic queue for legacy RFC 3168 flows and unmarked traffic. Classification happens at ingress on the fast path, before the packet is enqueued, and costs a single bitfield check on the IP ECN field:

// Ingress classification — per-packet, inline in the rx burst loop
uint8_t ecn = (pkt_ip->type_of_service & 0x03);
bool is_l4s = (ecn == 0x01 || ecn == 0x03);   // ECT(1) or CE — scalable sender

fiwi_meta(pkt)->queue_class = is_l4s ? QUEUE_L4S : QUEUE_CLASSIC;
enqueue_to_domain(pkt, domain_id, fiwi_meta(pkt)->queue_class);

Both queues drain toward the same transmit burst for that airtime domain. The scheduler services the L4S queue with a strict low-latency budget and the Classic queue at a rate that saturates the domain's aggregate share, matching the DualPI2 service model from RFC 9332.

16.7.2 The Coupling Mechanism

The key property of DualPI2 is that the two queues are not independent. The Classic queue's drop probability pc — computed by a PI controller from a congestion signal representing pressure at the shared bottleneck — also governs the L4S queue's ECN marking probability via a coupling factor k (default 2 in the Linux sch_dualpi2 reference implementation).

// Outer control loop — runs on a slow timer cadence (~16 ms), same lcore,
// non-preemptive. Not per-packet.
double signal_classic = ewma_update(&domain->classic_signal,
                                    ring_depth(QUEUE_CLASSIC));
double p_c = max(0.0, K_PI * (signal_classic - TARGET_CLASSIC));  // PI controller

double p_l = COUPLING_K * p_c;   // Coupled L4S marking probability

// Applied per-packet in the L4S dequeue path:
double p_l_step = (sojourn_L4S_ns > THRESHOLD_L4S_NS) ? 1.0 : p_l;
if (rte_rand_u64() < (uint64_t)(p_l_step * (double)UINT64_MAX))
    rte_ipv4_l4s_mark(pkt);      // Set ECN CE in-place, no copy

In a conventional queue-based implementation, signal_classic would be an EWMA of Classic queue depth. In Fi-Wi, that queue-derived signal is replaced as the PI controller input by Airtime Debt (Di), a forward estimate of wireless service time. The DualPI2 control law, coupling mechanism, and dual-queue topology remain unchanged; only the input signal changes.

Queue depth is a lagging indicator in Wi-Fi because contention, retries, and variable PHY rates consume airtime without necessarily appearing in buffer occupancy. Airtime Debt provides a forward-looking signal that better matches the true wireless bottleneck while preserving the DualPI2 coexistence structure required for L4S and Classic traffic to share the medium.

16.7.3 Per-Domain State and the fiwi_update Interface

Each airtime domain carries its own DualPI2 state alongside the fiwi_rrh_state struct (Section 17.5). Because each lcore owns a fixed set of domains exclusively (Section 16.8), this state is never shared across cores — no locks, no atomics, no cache-line bouncing on the fast path.

The telemetry path (Section 17.8) delivers ground-truth airtime measurements back to the lcore via a lockless ring carrying fiwi_update objects. The struct is defined here because it originates in the DPDK fast-path layer and is consumed by it; Section 17.8 populates it from Netlink/vendor telemetry events:

/**
 * fiwi_update — telemetry record posted by the Netlink callback,
 * consumed by the DPDK lcore during its scheduling loop.
 * Allocated from fiwi_update_pool (rte_mempool); returned after use.
 */
struct fiwi_update {
    uint8_t  type;          /* AIRTIME_RECONCILE (only type currently defined) */
    uint32_t rrh_id;        /* RRH index, validated < FIWI_MAX_RRHS before enqueue */
    uint64_t actual_us;     /* Hardware-path-to-status interval (ground truth) */
    uint64_t expected_us;   /* Forward estimate: T_phy + T_agg at enqueue time */
    uint32_t retry_us;      /* Observed retry airtime from telemetry metadata */
};
Per-domain fast-path structure (allocated in hugepages, lcore-local): domain[d] ├── l4s_ring mbuf ring, N_L4S slots (RING_F_SP_ENQ | RING_F_SC_DEQ) ├── classic_ring mbuf ring, N_CLASSIC slots (RING_F_SP_ENQ | RING_F_SC_DEQ) ├── classic_signal EWMA accumulator for controller input ├── pi_integral PI controller integral term ├── p_c current Classic drop probability ├── p_l coupled L4S mark probability (= COUPLING_K * p_c) └── port_queue_map (PCIe port, hw queue_id) → this domain rrh_update_rings[d] per-RRH lockless ring (RING_F_MP_HTS_ENQ | RING_F_SC_DEQ) fiwi_update_pool shared rte_mempool; safe to get() from non-EAL threads Slow-path timer (~16 ms, same lcore, non-preemptive): ewma_update → pi_update → refresh p_c, p_l Fast-path (every poll cadence): rx_burst → classify ECN → enqueue l4s / classic dequeue l4s (strict sojourn threshold) → mark CE → tx_burst dequeue classic (weighted, drop at p_c) → tx_burst drain rrh_update_rings → apply fiwi_apply_updates()
Figure 16-2: Per-domain DualPI2 state layout. All per-domain state is lcore-local and single-writer. The update ring uses RING_F_MP_HTS_ENQ because the Netlink callback runs on a non-EAL thread; the lcore-side dequeue uses RING_F_SC_DEQ (single consumer).

16.8 Multi-RRH lcore Topology and Control Ownership

The Umber concentrator runs on a workstation-class host with a Threadripper PRO processor and multiple PCIe-connected RRHs. This section describes how DPDK lcore assignments map onto that hardware topology to preserve cache locality, single-writer semantics, and deterministic fast-path execution.

Each lcore owns both the DualPI2 control state (Section 16.7) and the Airtime Debt estimator (Section 17) for its assigned RRHs. This ensures that congestion estimation, scheduling, and ECN marking operate within a single execution context.

16.8.1 RRH Assignment

RRH Range Assigned lcore Airtime Domains
0–3lcore 2domains 0–3
4–7lcore 4domains 4–7
8–11lcore 6domains 8–11
12–15lcore 8domains 12–15
16–19lcore 10domains 16–19
20–23lcore 12domains 20–23

16.8.2 Control and Data Flow

Each RRH lcore applies its per-domain DualPI2 loop as described in Section 16.7, with Airtime Debt (Di) serving as the PI controller input in place of queue depth. This presents a single, airtime-grounded congestion signal per domain to the L4S control loop.

Downlink traffic is classified at ingress and directed to the appropriate airtime domain. The owning lcore performs scheduling, ECN marking, and transmission. Uplink traffic follows the reverse path toward the WAN interface.

Because each lcore exclusively owns its RRHs and associated Airtime Debt state, congestion estimation, scheduling, and ECN marking operate without cross-core coordination. This preserves deterministic fast-path behavior.

Ingress → classify → assign domain → lcore owns RRH → compute D_i → schedule → transmit → measure → update C_i/R_i → recompute D_i
Figure 16-3: lcore ownership of RRHs and control loop execution.

17. Airtime-Assisted ECN: Airtime Debt as the Congestion Signal

Fi-Wi does not infer congestion from queue depth alone. The bottleneck is the wireless medium, and the relevant state variable is the time required to successfully transmit packets over that medium. The system replaces the queue sojourn-time inputs of traditional PI2 controllers with Airtime Debt (Di), converting a stochastic medium into a controlled service process.

17.1 The Bottleneck is Airtime, Not a Queue

In traditional L4S systems, ECN marking is derived from queue sojourn time, which assumes a stationary service rate. These assumptions fail in Wi-Fi because service time varies per client based on PHY rates, contention, and retries. Fi-Wi replaces backward-looking buffer metrics with a forward model of wireless service time. The Concentrator maintains this model continuously and makes scheduling decisions on predicted service outcomes, not observed queue growth. This approach provides the AQM with a signal that has a more stationary distribution than raw queue depth over a variable-rate medium, improving marking coherence and L4S stability.

17.2 Airtime Debt Model (Per RRH)

For each RRH (i), the Concentrator maintains a real-time Airtime Debt (Di):

Di = Ai + Ci + Ri

17.3 Measuring Ground Truth (Hardware-Path-to-Status)

The "Ground Truth" for airtime consumption is measured as the interval from descriptor posting into the hardware transmit path to TX Status (hardware completion signal via driver/vendor-specific telemetry events such as mt76 TX status reports). This interval captures the full service duration, including the full wait for TXOP eligibility (AIFS + backoff), aggregation delay, and all hardware-level retransmission attempts.

17.4 Predicted Sojourn Time (Si)

For any packet, the Predicted Sojourn Time (Si) is a forward estimate of delivery time:

Si(packet) = Di + Tservice(packet)

The Tservice calculation is decomposed into: Tagg (aggregation hold time) + Tphy (modulation time at current MCS) + Tretry (statistical retry overhead). This estimate is packet- and client-specific; it is not a constant service quantum.

17.5 Implementation: DPDK Fast Path State

The Concentrator tracks RRH state in hugepage-backed memory. The DPDK lcore is the sole writer of fiwi_rrh_state; telemetry updates are applied via per-RRH lockless ring buffers to preserve single-writer semantics and microsecond-level determinism.

struct __rte_cache_aligned fiwi_rrh_state {
    uint32_t rrh_id;
    uint64_t D_i;            /* Total airtime debt (A+C+R) */
    
    /* Component Estimates (microseconds) */
    uint64_t A_i;            /* Total scheduled airtime (queued + in-flight) */
    uint32_t C_i;            /* Estimated contention delay */
    uint32_t R_i;            /* Estimated retry penalty */

    /* Feedback & Synchronization */
    uint64_t last_update_us;     /* Timestamp of last lcore application */
    uint64_t last_tx_status_us;  /* TSC of last hardware completion */
    uint32_t moving_avg_per;     /* Recent PER (Section 15.4) */
};
    

Di is recomputed in the DPDK fast path after each update to Ai, Ci, or Ri. The loop updates Ai when packets are assigned to an RRH and decrements it upon TX completion using telemetry feedback.

17.6 Authoritative Congestion Signaling

Airtime Debt replaces physical queue depth as the authoritative input for the Dual-Queue AQM, providing a single, authoritative congestion signal across all RRHs without relying on a shared physical buffer.

17.7 Slow-Path Observability

While Di provides fast-path control, the system monitors Airtime Utilization (Uair = ΔTX_DURATION / Δt) as a slow-path observability metric. This metric is used to identify external interference patterns and long-term capacity shifts in the airtime domain, calibrating the confidence weights applied to the Ci and Ri estimators.

17.8 Telemetry Feedback: Netlink Calibration

The following logic processes TX_STATUS events from the mt76 driver. Completion data is retrieved from a pre-allocated mempool and posted to a per-RRH lockless ring to reconcile state without lcore contention.

/* Telemetry Path (Netlink Callback) */
static int fiwi_handle_mt76_telemetry(struct nl_msg *msg, void *arg) {
    struct nlattr *attrs[MT76_ATTR_MAX + 1];
    nla_parse(attrs, MT76_ATTR_MAX, genlmsg_attrdata(nlmsg_data(nlmsg_hdr(msg)), 0),
              genlmsg_attrlen(nlmsg_data(nlmsg_hdr(msg)), 0), NULL);

    if (!attrs[MT76_ATTR_TX_DURATION] || !attrs[MT76_ATTR_RRH_ID])
        return NL_SKIP;

    uint32_t rrh_id = nla_get_u32(attrs[MT76_ATTR_RRH_ID]);
    if (rrh_id >= FIWI_MAX_RRHS) return NL_SKIP;

    struct fiwi_update *update;
    if (rte_mempool_get(fiwi_update_pool, (void**)&update) < 0) return NL_SKIP;

    update->type = AIRTIME_RECONCILE;
    update->rrh_id = rrh_id;
    update->actual_us = nla_get_u64(attrs[MT76_ATTR_TX_DURATION]);
    update->retry_us = nla_get_u32(attrs[MT76_ATTR_RETRY_DURATION]);
    update->expected_us = estimate_service_time(msg); 

    rte_ring_enqueue(rrh_update_rings[rrh_id], update);
    return NL_PROCEED;
}
    

17.8.1 Telemetry Application (DPDK lcore)

The DPDK lcore closes the control loop by draining the update ring. It decrements the backlog and calibrates penalties to ensure the Airtime Debt remains an accurate representation of physical medium pressure.

/* DPDK lcore: apply telemetry updates */
static inline void
fiwi_apply_updates(struct fiwi_rrh_state *rrh, struct rte_ring *ring)
{
    struct fiwi_update *upd;
    while (rte_ring_dequeue(ring, (void**)&upd) == 0) {
        /* 1. Discharge processed backlog */
        rrh->A_i = (rrh->A_i > upd->actual_us) ? (rrh->A_i - upd->actual_us) : 0;

        /* 2. Update contention estimate (drift from expected modulation time) */
        uint32_t drift = (upd->actual_us > (upd->expected_us + upd->retry_us)) ? 
                         (upd->actual_us - upd->expected_us - upd->retry_us) : 0;
        rrh->C_i = (rrh->C_i * 7 + drift) >> 3;

        /* 3. Update retry penalty */
        rrh->R_i = (rrh->R_i * 7 + upd->retry_us) >> 3;

        /* 4. Recompute total Airtime Debt (D_i) */
        rrh->D_i = rrh->A_i + rrh->C_i + rrh->R_i;

        rrh->last_tx_status_us = rte_get_tsc_cycles();
        rte_mempool_put(fiwi_update_pool, upd);
    }
}
    

17.9 Visualization: The Airtime Debt Control Loop

Figure 17-1: Airtime Debt Control Loop showing Forward Service Model and Ground Truth Calibration

Figure 17-1: The Fi-Wi recursive control loop for stabilizing stochastic wireless service.

Diagram Overview: Closing the Feedback Loop

Figure 17-1 synthesizes the technical components of the Airtime Debt model into a continuous functional loop. The architecture separates the Speculative Forward Path (Fast Path) from the Calibrated Feedback Path (Telemetry Path).

1. Forward Service Model (Prediction): Every ingress packet triggers a per-STA calculation of Tservice. This is not a global constant; it is a client-specific sum of aggregation hold time (Tagg), PHY modulation time (Tphy), and predicted retry overhead (Tretry) based on that STA's specific RF context.
2. Debt Update & Marking Decision: The predicted Tservice is added to the RRH's Ai (Backlog). If the resulting Predicted Sojourn Time (Si) exceeds Tlow, an ECN CE mark is applied immediately in the DPDK fast path. This provides the "Virtual Backpressure" that stabilizes L4S senders.
3. Ground Truth Calibration (Correction): As the packet is dispatched via DMA, the hardware records the precise interval from descriptor posting into the hardware transmit path to TX Status completion. The Telemetry Path calculates the Drift—the delta between the forward prediction and physical reality.
4. Estimator Refinement: This drift is fed back into the EWMA filters for Ci (Contention) and Ri (Retries). This ensures that subsequent predictions for the same STA or RRH domain are corrected for changing medium pressure, effectively regularizing the stochastic nature of the 802.11 medium.

18. Summary

The core idea of Umber’s Fi-Wi architecture is to make a building full of Wi-Fi radios behave like a large number of predictable, low-latency, cellularized bottlenecks (often cell-per-room) that integrate cleanly with L4S, and to avoid Wi-Fi collapse in the regime that matters most for users: tail latency.

We do that by:

Compared to a building filled with independent APs, Fi-Wi provides:


Appendix A: 802.11 Backoff Timing & Collapse Dynamics

This appendix explains the precise behavior of the 802.11 CSMA/CA backoff algorithm, why the freeze/resume mechanics create strong nonlinearities under load, and how this drives the collapse behavior discussed in Sections 2 and 6. We also include reference diagrams, accurate pseudocode, and probability scaling that shows why birthday-paradox collisions appear long before PHY saturation.

A.1 Overview

The 802.11 MAC is built around two core mechanisms:

These mechanisms interact in a way that works beautifully for light to moderate station counts, but begins to break down sharply once multiple stations become backlogged. Collapse is not a "bug"; it is the mathematically expected outcome under high concurrency.

A.2 Backoff Decrements Only During Idle SlotTime

When a station has a frame to send, it chooses a random integer:

B ← Uniform[0, CW]
where CW is the contention window. The counter decrements only when:

If any of these conditions break during a SlotTime boundary, backoff does not decrement.

Diagram A-A — Backoff Countdown with Idle Slots and Freezes

Time →  ───────────────────────────────────────────────────────────────────────→

Channel:    Busy TXOP      Idle slot     Idle slot     Busy TXOP      Idle ...
           ────────────┐  ┌─────────┐   ┌─────────┐  ┌───────────┐
                       │  │ slot OK │   │ slot OK │  │collision  │
                       └──┘         └───┘         └──────────────┘

Backoff B:   [frozen]        B:=B-1       B:=B-2        [frozen]       B:=B-3

This "idle-slot-only" decrement rule is the source of nonlinear timing behavior.

A.3 Freeze Conditions: Physical Busy + NAV Busy

The backoff counter freezes immediately under either condition:

NAV counts down in microseconds, not slot units, so a NAV may span dozens or hundreds of SlotTimes, creating long frozen periods.

Diagram A-B — NAV Freezes Backoff for Entire Duration

Frame overheard with Duration=480µs
     NAV := 480 µs  ─────────────────────────────────────────────▶ 0 µs

Backoff:
   Frozen until NAV==0
   Then: AIFS idle interval → first idle SlotTime → resume B countdown

A.4 Full Backoff State Machine

The following pseudocode describes the real 802.11 backoff and retry machine:

# Variables
B   = random integer in [0, CW]
CW  = CWmin initially, doubled on failures
NAV = virtual carrier sense (µs timer)
Slot = 9 microseconds (typical)
AIFS = access category-specific inter-frame space

while True:

    wait_until( medium_idle() and NAV == 0 )
    wait(AIFS)  # must see idle for entire AIFS

    # Backoff countdown
    while B > 0:

        if medium_idle() and NAV == 0:
            wait(Slot)
            if medium_idle() and NAV == 0:
                B -= 1      # decrement only if entire slot was idle
        else:
            # Freeze B until another idle AIFS appears
            wait_until( medium_idle() and NAV == 0 )
            wait(AIFS)

    # Backoff fully expired, attempt TX
    transmit()

    if ack_received():
        CW = CWmin
        B = random(0, CW)
    else:
        CW = min(2 * CW, CWmax)
        B = random(0, CW)

The critical detail: multiple stations freeze and resume their counters in lock-step after every long TXOP or NAV, making collisions statistically inevitable as station count grows.

A.5 Collision Probability and the Birthday Paradox

Each station independently picks a backoff slot in [0, CW]. The probability that no two stations choose the same slot is:

P(no collision) = (CW+1)! / [(CW+1 - n)! · (CW+1)^n]

where n = number of active contenders. Therefore:

Diagram A-C — Collision Probability vs. Number of Stations

Stations (n) →   4     6      8      10     12     16
--------------------------------------------------------
P(collision)   ~12%   30%    48%    65%    78%    >90%

(CWmin = 15)

This is the MAC-level reason collapse begins long before PHY capacity is reached.

A.6 Why Collapse Appears as 2–3 ms TXOP Tails

Once collisions become frequent:

Diagram A-D — TXOP Length as Collapse Indicator

Healthy:    T50 ≈ 200–500 µs,   T95 < 0.8 ms,    T99 < 1.2 ms
Degraded:   T95 = 1–2 ms,       T99 = 2–3 ms
Collapsed:  T95 > 2 ms AND      T99 ≥ 3 ms (dominant channel monopolization)

A single 3 ms TXOP already violates the bottleneck-delay budget required by L4S (≈250–300 µs). With multiple stations taking such TXOPs, service gaps can reach 10–50 ms for unlucky flows.

A.7 Multi-Station Synchronization Example

The following diagram illustrates how multiple stations become phase-aligned:

Time →  ────────────────────────────────────────────────────────────────→

TXOP1 by STA-A:   ────────────────
NAV for others:   ──────────────── (all B frozen)

After NAV expires:
All stations wait AIFS → begin countdown
Slot 1:  B_A=2, B_B=4, B_C=2
Slot 2:  B_A=1, B_C=1
Slot 3:  B_A=0  ,  B_C=0   → simultaneous transmit → collision

This synchronization is why the birthday paradox applies so strongly in Wi-Fi.

A.8 Why Fi-Wi Breaks the Cycle

Fi-Wi removes the “every station fends for itself” randomness by:

Thus Fi-Wi converts Wi-Fi from a chaotic CSMA/CA system into a scheduled, low-latency cellular MAC.

Appendix B: Channel State Information (CSI) and Learning-Enhanced Fi-Wi

This appendix describes how Fi-Wi can use Channel State Information (CSI) from each RRH, together with learning models (e.g. LSTM or TCN), to improve grouping, scheduling, redundancy, and control beyond what is possible with queue-based feedback alone.

B.1 What CSI provides in a Fi-Wi context

Concept: What is CSI?
Imagine shouting in a complex room. You hear echoes bouncing off walls, furniture, and people. If you analyze those echoes, you can map the environment.

In Wi-Fi, Channel State Information (CSI) is that map. It describes exactly how the radio wave traveled from the transmitter to the receiver—including all the bounces (multipath), fading, and phase shifts caused by the physical environment. Traditional APs throw this data away after decoding the packet. Fi-Wi sends it to the Concentrator, allowing the system to "see" the RF environment and mathematically calculate how to steer beams or combine signals.

Wi-Fi Sensing: Because physical objects reflect radio waves, any movement in the room changes the CSI pattern. By monitoring these changes over time, Fi-Wi can detect presence—such as a person walking or a pet breathing—turning the network into a ubiquitous sensor without cameras.

Modern 802.11 chipsets can export CSI per subcarrier or per resource unit: complex-valued estimates of the channel between an RRH and a station (STA). In a Fi-Wi deployment, each RRH periodically reports:

Thanks to centralized time synchronization and packet memory, the concentrator can align CSI reports with:

This gives Fi-Wi a rich per-domain, per-STA time series:

B.2 What we want to predict

Using this data, Fi-Wi can learn models to help answer questions such as:

These predictions can feed directly into:

B.3 Example model: LSTM / TCN

One reasonable approach is to use a sequence model such as an LSTM or Temporal Convolutional Network (TCN) per airtime domain:

Input features (per timestep):
  - queue depth q_k
  - marking probability p_k
  - throughput, PER, retries
  - per-RRH CSI summary (e.g. dominant eigenvalues/eigenvectors)
  - beacon power settings, channel, bandwidth

Outputs:
  - predicted effective capacity C_eff,k+1
  - predicted collapse risk score
  - recommended group reconfiguration / beacon adjustments (optional)

A higher-level policy layer then uses these predictions to:

The key point is that Fi-Wi has access to the joint state across all RRHs—queues, CSI, MAC outcomes, and beacon configuration—so learning can be done on a true building-scale view rather than a per-AP snippet.

B.4 The Non-Linear Control Policy (Feature Vectors)

While the PI² controller (Section 5.2) provides a robust baseline using linear control theory, the wireless medium is inherently non-linear. A small drop in SNR can cause a discrete, non-linear step-down in MCS, cutting capacity by half in microseconds. A linear controller often reacts too slowly to these step-changes.

Because the Concentrator terminates both the MAC (Inner Loop) and L4S (Outer Loop), it possesses a complete, global view of the system state. This allows Fi-Wi to implement a Non-Linear Marking Signal derived from a rich real-time feature vector:

Feature Vector x(t) = [
   MCS_t,          // Current Modulation (Capacity potential)
   PHY_Rate_t,     // Raw drain rate
   RTT_outer,      // End-to-end latency (Sojourn + Flight)
   Q_depth_t,      // Current backlog
   d_arrival/dt    // Arrival rate gradient (ARM Policer)
]
  

Optimization Objective: Efficiency vs. Latency
The system uses this vector to solve the fundamental Wi-Fi trade-off: Aggregation Efficiency vs. Serialized Latency.

This creates a Non-Linear Marking Signal that optimizes Throughput per Microsecond of Latency, rather than simply targeting a fixed queue depth.


Appendix C: Latency Hiding via Scatter-Gather DMA

Early architectural models of C-RAN often assumed a "Store-and-Forward" approach, where full packets must be buffered at the edge to meet timing. Fi-Wi eliminates this inefficiency by leveraging the natural physics of the 802.11 air interface. We utilize a Scatter-Gather DMA engine with Preamble Hiding to enable a "Thin RRH" design with minimal local SRAM.

C.1 The "Preamble Shield" Physics

The critical timing constraint in Wi-Fi is the transition from "Decision to Transmit" to "Energy on Air." However, the 802.11 PHY does not transmit user data immediately. Every transmission begins with a PHY Preamble (PLCP) and MAC Headers.

Time-Domain View of a Transmission Start: T=0 µs T=5 µs T=24 µs (approx) | TX Trigger | | | | Preamble & Headers | Payload Data Starts... [ MAC Logic ]->[/////////////////////////][......................] ^ ^ | | Source: Local RRH SRAM Source: Host Concentrator DRAM (Instant Access) (Fetched via Fiber)

The Insight: The transmission of the Preamble and Headers takes roughly 20–40 µs (depending on PHY generation). The round-trip time to fetch payload data over 100m of PCIe-over-Fiber is roughly 2–5 µs.

Consequently, the fetch latency is completely "hidden" behind the transmission of the headers. The payload data arrives at the RRH's small FIFO well before the PHY is ready to modulate it.

C.2 Scatter-Gather Architecture

Instead of a large packet buffer, the Fi-Wi RRH implements a Scatter-Gather DMA engine that composes frames on the fly from two distinct memory regions:

  1. Template RAM (Local RRH SRAM): Stores 802.11 MAC headers, PLCP headers, and delimiter signatures. This memory is small (< 16 KB), fast, and populated by the Concentrator during the descriptor posting phase.
  2. Payload Buffer (Remote Concentrator DRAM): Stores the actual 802.3 Ethernet payloads. These remain in the host server's memory until the exact moment of transmission.

C.3 The Transmit Sequence

  1. Descriptor Posting: The Concentrator posts a descriptor to the RRH. This descriptor points to the header in Local RAM and the payload in Remote DRAM.
  2. Contention: The RRH MAC performs EDCA backoff. No data is moved during this phase.
  3. TX Trigger: When backoff reaches zero, the MAC immediately begins transmitting the Preamble from Local RAM.
  4. Just-in-Time Fetch: Simultaneously with the Preamble start, the DMA engine issues a read request to the Concentrator for the payload data.
  5. Cut-Through: Data returns from the fiber, flows into a small speed-matching FIFO (e.g., 4 KB), and flows directly into the PHY serialization path immediately following the header.

C.4 Solving the Retry Timing (SIFS)

A common objection to C-RAN is the SIFS deadline (16 µs) required for retries. If a transmission fails, the station must retransmit immediately.

With Scatter-Gather, the RRH does not need to buffer the packet for retries. If a NACK occurs, the MAC simply resets the Scatter-Gather engine. It re-transmits the Preamble (from Local RAM) while re-issuing the DMA fetch (from Remote RAM). Because the fiber latency (5 µs) is significantly shorter than the SIFS + Preamble duration, the data again arrives in time.

C.5 Architectural Benefits


Appendix D: 802.11ax/be Features and Fi-Wi Integration

Modern Wi-Fi standards — particularly 802.11ax (Wi-Fi 6/6E) and 802.11be (Wi-Fi 7) — introduce features that appear to address some of the same problems as Fi-Wi: uplink scheduling, spatial reuse, and multi-AP coordination. This appendix clarifies how these features relate to Fi-Wi's architecture, where they're complementary, and why they don't eliminate the need for Fi-Wi's centralized data-plane approach.

Key takeaway: 802.11ax/be features like trigger frames and multi-AP coordination are valuable enhancements that Fi-Wi can leverage when client support is available, but they operate at a different architectural level (per-AP MAC features vs. building-scale data-plane unification) and cannot replace Fi-Wi's core innovations: centralized queues, shared state, L4S marking coordination, and dynamic RF grouping across the entire building.

D.1 Trigger Frames and Uplink Scheduling

802.11ax introduced trigger frames (TF) to enable centralized uplink scheduling. Instead of clients contending for the channel using stochastic EDCA backoff, the AP sends a trigger frame that grants specific clients permission to transmit on specific OFDMA resource units (RUs) or spatial streams at a specific time.

What trigger frames provide:

How trigger frames align with Fi-Wi:

Trigger frames match Fi-Wi's philosophy of centralized scheduling rather than distributed contention. In a Fi-Wi deployment where RRHs support 802.11ax and clients support uplink OFDMA/MU-MIMO, the concentrator can:

Reality check — client support in 2025:

While 802.11ax was ratified in 2019, uplink OFDMA support remains inconsistent. Crucially, trigger frames only control 802.11ax/be clients; legacy devices (iPhone 11, older IoT) are invisible to this schedule. These legacy clients cannot parse the trigger, so they continue to contend via random EDCA, acting as unmanaged interference sources. In contrast, Fi-Wi's reception diversity (Section 8.1) enhances uplink reliability for all clients, regardless of generation, by combining signals from multiple RRHs.

D.2 Why Trigger Frames Don't Eliminate the Need for Fi-Wi

A natural question: "If 802.11ax APs can use trigger frames for uplink scheduling, why do we need Fi-Wi's centralized architecture?"

Answer: Trigger frames address only a small subset of the problems Fi-Wi solves, and even for uplink scheduling, they provide per-AP control, not building-scale coordination.

What trigger frames do NOT provide:

  1. Centralized queues across APs: Even with trigger frames, each AP maintains its own independent downlink and uplink queues. There's no shared queue state, no unified bottleneck, and no coordinated ECN marking across APs.
  2. Shared state: Trigger-capable APs still operate autonomously. They don't share CSI, retry statistics, airtime usage, or queue metrics. Each AP makes trigger scheduling decisions based only on its local view.
  3. Coordinated L4S marking: There's no mechanism in 802.11ax for multiple APs to coordinate ECN marking or present a single logical bottleneck to L4S. Each AP marks (or doesn't mark) independently.
  4. Dynamic RF grouping: 802.11ax APs don't dynamically reconfigure which radios share airtime resources based on interference, CSI structure, or collapse risk. They're fixed islands.
  5. Tail latency control: Trigger frames help with uplink efficiency, but they don't address the fundamental problem of hidden queues, uncontrolled aggregation, and tail latency blowup under load across a multi-AP building.

D.3 OFDMA Resource Units and Airtime Domains

802.11ax OFDMA subdivides a channel into resource units (RUs). In Fi-Wi, an airtime domain is a logical entity representing a shared RF resource. OFDMA RUs provide finer-grained subdivision of that airtime resource.

Conceptually:

This does not change the fact that all RRHs in that airtime domain share a single group queue and marking point. It simply allows the service process to be more efficient.

D.4 BSS Coloring and Spatial Reuse

802.11ax BSS coloring allows STAs to distinguish between intra-BSS frames (same color) and inter-BSS frames (different color), enabling more aggressive spatial reuse.

Relationship to Fi-Wi RF grouping: Fi-Wi's dynamic RF grouping (Section 6) serves a similar but more sophisticated purpose. Fi-Wi uses richer information (CSI, retry statistics, airtime) to decide grouping, not just RSSI thresholds. In a Fi-Wi deployment, the concentrator can assign BSS colors to RRHs strategically: RRHs in the same airtime domain get the same color, while isolated domains get different colors.

D.5 802.11be (Wi-Fi 7) Multi-AP Coordination

802.11be (Wi-Fi 7) introduces multi-AP coordination features that appear to move in Fi-Wi's direction:

How these relate to Fi-Wi: These features acknowledge the problem of autonomous APs but approach it incrementally. 802.11be uses distributed AP-to-AP messaging, which limits scale and speed. Fi-Wi centralizes the data plane, enabling deeper coordination than distributed messaging can achieve.

D.6 Deployment Strategy: Mixed Client Populations

A key advantage of Fi-Wi's architecture is that it degrades gracefully with mixed client populations and doesn't require forklift client upgrades.

Client capability tiers in a 2025 deployment:

  1. Legacy 802.11ac and earlier: No trigger frame support, no OFDMA, no BSS coloring.
    • Fi-Wi provides: centralized downlink queuing, L4S marking, reception diversity on uplink, beacon shaping to reduce contention.
    • Result: Significantly better latency and stability than traditional multi-AP, even without 802.11ax features.
  2. 802.11ax with partial features: May support downlink OFDMA, BSS coloring, some power save enhancements, but not uplink OFDMA or uplink MU-MIMO.
    • Fi-Wi provides: All of the above, plus downlink MU-OFDMA where beneficial, coordinated BSS coloring across RRH groups.
    • Result: Better spatial reuse and efficiency, still robust to clients that don't support full 802.11ax.
  3. 802.11ax with full features: Supports uplink OFDMA and uplink MU-MIMO via trigger frames.
    • Fi-Wi provides: All of the above, plus trigger-based uplink scheduling, uplink MU-OFDMA for small packets, coordinated uplink/downlink airtime management.
    • Result: Bidirectional sub-millisecond latency control, maximum airtime efficiency.
  4. 802.11be (Wi-Fi 7): Adds MLO, 320 MHz channels, 4096-QAM, possibly multi-AP coordination support.
    • Fi-Wi provides: Can leverage MLO via concentrator coordination (Section 13.3), wider channels for capacity, and potentially integrate with 802.11be multi-AP features while maintaining superior shared-state coordination.
    • Result: Cutting-edge performance while maintaining backward compatibility.

Deployment strategy:

D.7 Summary: 802.11ax/be as Enhancements, Not Replacements

802.11ax and 802.11be introduce valuable features — trigger frames, OFDMA, BSS coloring, multi-AP coordination — that align with Fi-Wi's centralized control philosophy and can enhance Fi-Wi deployments when clients support them. However:

  1. These features do not eliminate the need for Fi-Wi's architecture. They provide per-AP enhancements and limited inter-AP coordination, but they cannot create the unified data plane, shared state, and building-scale control that Fi-Wi provides.
  2. Fi-Wi is designed to work with or without them. Core benefits (centralized queues, L4S marking, tail latency control) are independent of client 802.11ax/be support.
  3. Fi-Wi leverages them when available. As client capabilities improve, Fi-Wi automatically benefits from trigger-based uplink scheduling, OFDMA efficiency, and other enhancements without requiring architectural changes.

In short: 802.11ax/be features make Fi-Wi better, but Fi-Wi solves problems these standards cannot address within the constraints of the distributed-AP model. Fi-Wi is not "better APs" — it's a different architecture that happens to integrate well with modern Wi-Fi standards as they evolve.


Appendix E: ASIC Evolution to Complexity

E.1 Why ASICs accumulate legacy complexity

Unlike software, ASICs cannot easily “refactor away” unused features. Removing blocks typically requires re-verifying entire subsystems, while adding blocks often requires verifying only the new logic. This asymmetry encourages accumulation:

Over many product generations, this leads to RTL codebases that only grow. Legacy modulation modes, preambles, power-save FSMs, calibration paths, and debug hooks persist long after their practical value has disappeared.

E.2 Real costs of legacy bloat

This accumulated complexity has tangible costs:

E.3 How Fi-Wi changes the design equation

Fi-Wi’s architecture separates the system into:

This separation dictates where complexity must live. RRHs implement only what must be fast and deterministic: RF front end, PHY processing, minimal MAC TX/RX, DMA, PTP synchronization, and PCIe-over-fiber transport. All high-level behavior (queueing, L4S policy, aggregation strategy) lives in the concentrator.

E.4 Economic and engineering leverage

For a modern Wi-Fi chip at an advanced node, even a modest reduction in unnecessary logic can translate into significant savings: smaller die, lower power, simpler verification, and faster time to market.

E.5 Design principle for Fi-Wi RRH silicon

The guiding principle for Fi-Wi RRH design is:

Complexity belongs in the concentrator; only latency-critical functions belong in RRH silicon.

Concretely, this means: no autonomous AP queueing/scheduling logic, no legacy PHY/MAC support beyond what Fi-Wi needs, and no embedded firmware CPU managing per-station behavior at the edge.


Appendix F: A Day in the Life of a Packet (The "Preamble Shield" in Action)

To truly understand Fi-Wi, we must follow a single packet through the system at the microsecond scale. This narrative illustrates how the Workstation Concentrator (Section 13) and the Scatter-Gather RRH (Appendix C) collaborate to trick the physics of latency.

F.1 The Scenario

The Setting: Room 304 (Served by RRH-A and RRH-B). The Flow: A 4K Video Frame (Downlink) destined for "Alice's Laptop." The Constraint: L4S requires <1ms tail latency. The Challenge: The packet is currently 200 meters away in the Concentrator's DRAM.

F.2 The Downlink Race (The "Preamble Shield")

T = 0 µs (Arrival): The video packet arrives at the Concentrator's NIC. The CPU timestamps it immediately.

T = 2 µs (The Decision): The Concentrator's software scheduler inspects the packet.

T = 10 µs (The Setup): The scheduler posts a DMA Descriptor to RRH-A via PCIe.
Note: The payload data (1500 bytes) stays in the Concentrator. Only a 16-byte pointer moves to the edge.

T = 50 µs (The Trigger): RRH-A's LBT logic sees the airtime is clear. It begins the transmission sequence. This is where the magic happens:

The Race Against the PHY:
Action 1: RRH-A starts transmitting the 802.11 Preamble (PLCP) from its local SRAM. This takes 20 µs of airtime.
Action 2: Simultaneously, RRH-A issues a PCIe Read Request to fetch the payload from the Concentrator.

The payload must travel 200m up the fiber and back before the Preamble finishes transmitting.

T = 52 µs (The Fetch): The Read Request hits the Concentrator's PCIe controller. Because of the 92-lane non-blocking fabric (Section 13), there is zero switching delay.

T = 55 µs (The Return): The payload data flies back down the fiber.

T = 58 µs (The Handover): The payload data arrives at RRH-A's FIFO. The PHY is just finishing the last symbol of the Preamble.

T = 59 µs (Seamless Serialization): The PHY seamlessly switches from transmitting the Preamble to transmitting the payload. To the air, it looks like one continuous stream. The 200-meter fiber latency effectively vanished because it was hidden behind the mandatory PHY training sequence.

F.3 The Uplink Journey (Diversity & Sensing)

T = 200 µs: Alice sends a TCP ACK.

T = 204 µs (The Multi-Stat): Both RRH-A and RRH-B hear the ACK.

T = 210 µs (The Race Up): Both RRHs push the packet + CSI metadata to the Concentrator.

T = 215 µs (The Deduplication): The Concentrator sees two copies of Sequence #104. It discards the weak one from RRH-B but keeps the CSI data to update the "Sensing Model" (detecting that someone is standing near RRH-B, blocking the line of sight).

F.4 Contrast with Legacy Wi-Fi

If this were a traditional AP:

F.5 Edge Cases and Advanced Scenarios

RRH Failure: If RRH-A fails during the prefetch (e.g., power loss), the concentrator detects the link loss immediately via PCIe link state. Because the packet payload never left Concentrator DRAM, the scheduler simply re-posts the descriptor to RRH-B. No packet is lost, and TCP does not see a drop.

Congestion: The scatter-gather pipeline depth allows the Concentrator to queue up the next descriptor while the current one is transmitting. This allows back-to-back TXOPs (SIFS spacing) without idle gaps on the air, even with the fiber latency.

Coordinated Transmission: The Concentrator can schedule RRH-A and RRH-B to transmit concurrently to spatially separated clients. It analyzes the CSI matrix to determine if spatial isolation is sufficient (>25 dB cross-coupling attenuation). If yes, both RRHs transmit simultaneously using standard 802.11 frames. If interference is detected, the Concentrator schedules sequential TXOPs. This dynamic decision happens per-packet based on real-time CSI.

F.6 Summary: The Packet's Perspective

From the packet's view, Fi-Wi provides uplink diversity, per-flow fair queuing, accurate ECN marking, and speculative DMA that hides PCIe latency. The packet experiences the network as a transparent, zero-wait pipe.

F.7 The Critical Insight: Timing vs. Intelligence

Fi-Wi separates timing (RRH hardware) from intelligence (Concentrator software), bridged by the speculative DMA prefetch pipeline. This allows the hardware to meet strict microsecond deadlines while the software retains the flexibility to run complex scheduling, L4S, and spatial multiplexing logic.


Appendix G: The Strategic Case for Fiber Infrastructure

The upfront cost of installing fiber is often the primary friction point for C-RAN adoption ("The Fiber Tax"). However, this framing ignores the physics of modern signaling and the macroeconomics of construction. Fi-Wi's reliance on fiber is not a tax; it is a strategic asset conversion.

G.1 The Physics of 100G (The Copper Wall)

We are hitting a hard physical limit with copper cabling. At modern data center speeds (100Gb/s), signal loss in copper is so high it is characterized in dB per inch.

G.2 Labor Rate Hedging (Inflation Proofing)

In low-voltage construction, the cost of cabling is dominated by labor (often 70-80%), not material.

G.3 Asset vs. Consumable

Unlike HDMI or Copper Ethernet—which are purpose-built cables engineered for a single generation—fiber is a raw transport medium. It is a "pipe for light" that supports Ethernet, DWDM, and PCIe-over-Fiber simultaneously.

While cable standards have cycled (Cat5e → Cat6 → Cat6A), they remain tethered to the legacy RJ45 connector. This physical interface is rapidly becoming obsolete. Fi-Wi recognizes that the connection is what matters, not the physical port. In this architecture, the 802.11 wireless interface becomes the new connector. By installing fiber once as a permanent asset and treating Wi-Fi as the universal 'plug' inside the room, the building infrastructure is 'one and done'. This finally breaks the cycle of physical obsolescence.

Appendix H: Centralized Observability and the ML Advantage

Fi-Wi's centralized architecture provides observability that is difficult or impractical to achieve in distributed AP systems. This appendix presents the Observability Matrix—a systematic comparison of what telemetry is directly observable, partially observable, or hidden across different measurement approaches. This complete visibility is the prerequisite for effective machine learning (Section 15) and deterministic L4S control.

The Observability Gap

Traditional Wi-Fi deployments rely on tools that provide only partial visibility into system state. Operators attempt to infer problems from symptoms (latency spikes, ECN marks, throughput degradation) without directly observing root causes (queue growth, retry timing, MCS selection under interference). This inference distance—the number of steps between observable effects and hidden causes—makes control systems less stable and limits the effectiveness of machine learning.

The table below compares observability across six measurement approaches. The legend indicates:

Direct: Directly measurable with microsecond-resolution timestamps
Partial: Partially observable or requires inference
Not Observable: Hidden or cannot be reliably measured

Observability Matrix

Telemetry / Metric ESP32-C5
RF sensor
RPi 5
Monitor mode
RPi 5
L4S node
tcpdump
Packet capture
iperf2
L4S
Fi-Wi
Concentrator
Energy detect / CCA
Channel busy time
NAV / medium reservation
CSI / channel matrix
MCS / GI / NSS
PER / retry counts
RSSI / SNR
Queue depth
Sojourn time
ECN marks
One-way delay (OWD)
Responsiveness
Throughput / goodput
Deterministic playback

Critical Observations

Queue Depth and Sojourn Time (highlighted rows):

These metrics are essential for L4S congestion control and machine learning. Traditional tools (tcpdump, Wi-Fi packet capture) cannot directly observe queue state because it exists inside firmware or kernel layers. While synchronized ingress and egress packet captures could theoretically infer queue depth through timing correlation, this approach requires nanosecond-precise time synchronization across physically separated capture points, perfect packet correlation despite potential losses, and still cannot observe firmware-internal retry queues, aggregation buffer states, or PHY scheduling decisions. External sniffers see the explosion (the packet hitting the air), but they cannot see the fuse burning (the packet sitting in the driver queue). Only centralized queueing architectures expose these values with direct microsecond-resolution timestamps.

MCS / GI / NSS (PHY Configuration):

Monitor-mode packet capture can partially infer MCS from radiotap headers, but this only shows what was transmitted—not the decision process, CSI data, or PER history that informed the choice. The Fi-Wi Concentrator has direct access to the complete decision state.

Deterministic Playback (bottom row):

This capability enables machine learning. Deterministic playback means the Concentrator can reproduce its own decision sequence from a log file: packet arrivals, queue transitions, scheduling decisions, MCS selections, and RRH transmission commands. While actual RF outcomes depend on station behavior and channel conditions that may vary, the Concentrator can replay its control decisions under the logged RF environment to evaluate alternative strategies offline and verify whether different MCS/scheduling choices would have improved performance. This is only possible when all Concentrator-controlled components operate under a single clock with complete state visibility. Distributed systems cannot reconstruct this causal chain from partial packet traces because they lack visibility into queue state, retry logic, and the decision-making process itself.

Why This Enables More Effective Machine Learning

Section 15 describes how Fi-Wi uses machine learning to optimize MCS transition rates. The observability matrix demonstrates significant practical advantages that Fi-Wi's centralized architecture provides for ML training:

Fi-Wi's centralized architecture provides these observability advantages. The Concentrator's event log becomes a high-quality training dataset where every state transition is labeled with measured outcomes under consistent instrumentation. While autonomous AP systems could attempt ML-based rate adaptation using the partial observability available to them, Fi-Wi's richer telemetry—particularly queue visibility, global CSI, and deterministic replay—enables significantly more effective learning and optimization.

Coordination Shares Outcomes; Fi-Wi Centralizes Causes

Coordinated AP systems can share summaries (throughput, ECN marks, interference reports) but cannot share hidden internal state (queue depth, firmware retry logic, aggregation decisions). This creates inference distance—the controller sees effects but not causes. Fi-Wi eliminates inference distance by removing autonomous decision-making from the edge. Queues, scheduling, and PHY selection are centralized under a single clock, producing an observable state graph where causes are explicit, replayable, and directly controllable. This architectural difference translates to measurably better ML training data quality.

Appendix I: Channel Width Orchestration and Service Time Variance

The Fi-Wi architecture treats channel width as a dynamic control parameter managed by the Concentrator. While 802.11be (Wi-Fi 7) emphasizes 320 MHz peak PHY rates, Fi-Wi's orchestration engine strategically selects 40 MHz channel widths in high-density environments to ensure Service Time Stationarity and the stability of the L4S control loop.

I.1 The Contention-Domain Collapse of Wideband Channels

In shared-spectrum MDUs (Multi-Dwelling Units), the theoretical gain of wider channels is often negated by contention-domain collapse. In a CSMA/CA environment, a transmission opportunity (TXOP) requires the entire bonded channel to be idle. In a 6-AP overlapping scenario with 50% aggregate airtime occupancy, the probability of finding all sub-bands simultaneously idle drops exponentially with bandwidth.

Under a simplified independent-sub-band occupancy assumption, a basic model suggests P(160 MHz idle) ≈ (P(40 MHz idle))^4, resulting in 4–16× fewer transmission opportunities. In practice, partial correlation between sub-bands moderates the exponent but does not eliminate the super-linear decline in idle probability. This leads to:

I.2 Queueing Theory and L4S Stability

From an M/G/1 queueing perspective, the performance of the L4S control loop depends on the stability of the service rate (μ). L4S stability requires frequent service opportunities and low variance in service time to prevent the decoupling of the sender's congestion window from the actual queue state.

I.3 Link Adaptation and Spectral Robustness

Narrower channels reduce the probability that partial-band interference (e.g., unmanaged IoT bursts) forces a full MCS downgrade across the entire bonded width. This allows the Concentrator to maintain stable link adaptation and a predictable drain rate, avoiding the chaotic rate-shifting common in 160 MHz deployments.

I.4 Orchestration: Width as a Control Variable

Fi-Wi is not anti-wideband; channel width is an orchestrated variable. The system expands width opportunistically when contention is low to leverage PHY gains and contracts it to 40 MHz when deterministic latency is required. This prioritizes spatial reuse and airtime isolation over maximum burst rate—the fundamental technical unlock for Fi-Wi’s cell-per-room model.

I.5 Capacity Density: Throughput Under a Latency SLO

Fi-Wi optimizes Capacity Density under a Latency SLO, rather than peak PHY on a single link. In dense OBSS environments, wide channels reduce spatial reuse; narrower channels increase the number of bounded contention domains. Consequently, aggregate goodput per area increases even if per-link PHY decreases.

Metric Definition: Low-Latency Goodput Density (ρ_LL)

ρ_LL [Mbps / 1,000 sq ft] = (Σ Goodput_i) / Area | subject to p95 OWD ≤ 20ms

Where Goodput_i is the application-layer payload throughput delivered while maintaining the p95 one-way delay (OWD) constraint. The 20ms threshold reflects the target for interactive L4S applications.

Example Calculation (1,000 sq ft section of a 10,000 sq ft floor):

Assumptions: 50% aggregate offered load per BSS, default EDCA parameters, and no explicit inter-AP coordination in the autonomous case.

I.6 Application: Aligning Wireless Capacity to Gigabit WAN Service

To align with a Gigabit-class WAN service, the wireless architecture must match the aggregate wireline supply to orchestrated spatial demand. In a dense MDU, Contention Delay is 10–100× larger than serialization time. A single 160 MHz AP attempting to serve a Gigabit load creates a "fast but flaky" link that collapses under co-channel interference, delivering only a fraction of the ISP's provided capacity to real-time applications.

Fi-Wi resolves this by using 40 MHz orchestration to spread the Gigabit load across N coordinated spatial domains. This ensures that the building-wide wireless fabric can actually saturate a 1 Gbps WAN link with deterministic, multi-user goodput, rather than relying on single-device peak bursts that starve other users and destabilize shared airtime.

I.7 Aggregation Quantization and L4S Feedback Mismatch

L4S signals congestion at Layer 3 (IP ECN), but wideband Wi-Fi operates via massive Layer 2 A-MPDU aggregation to maintain PHY efficiency. This creates a fundamental control-loop mismatch:

The Fi-Wi architecture addresses these challenges through its DualQ implementation (Section 5.2), which maintains separate queues for L4S and Classic traffic and performs per-packet sojourn time measurements at the Concentrator before entering the A-MPDU aggregation pipeline.


Comparison of Service Metrics (Dense MDU Contention Model)

Scenario: 2x2 MIMO, 6+ overlapping BSSIDs, shared unlicensed spectrum (5/6 GHz), 50% aggregate offered load, autonomous EDCA parameters. See Appendix J for full simulation parameters.

Metric 160 MHz (Autonomous CSMA) 40 MHz (Fi-Wi Orchestrated)
Peak PHY Rate (2x2, MCS 11) ~1.2 Gbps ~300-400 Mbps
Effective Airtime Utilization <10% (Fragmented TXOPs) 30–50% (Planned reuse / Bounded domain)
Service Time Variance (σ²) High (Heavy-tailed) Low (Near-stationary)
Queue Service Interval (median) Tens to >100 ms 5–15 ms (Stationary)
DualQ ECN Feedback Coherence Sparse / Burst-marked Continuous / Stable marking
Goodput Density (ρ_LL)
(Mbps per 1,000 sq ft)
~12 Mbps
(Overlapping contention domains)
~128 Mbps
(8 RRHs, orthogonal 40 MHz channels)

Economic Conclusion: Under realistic dense MDU conditions, Fi-Wi's orchestrated 40 MHz architecture delivers ~10× higher usable goodput density compared to autonomous wide-channel deployments. This is the fundamental advantage of Fi-Wi: capacity scales with RRH density and spatial reuse, not channel width alone.

See Appendix J for detailed contention modeling and simulation methodology.

Appendix J: 10-Node MDU Simulation Methodology

This appendix details the Monte Carlo simulation and analytical models used to derive the Low-Latency Goodput Density (ρ_LL) metrics. The framework evaluates Fi-Wi's spatial capacity gains under realistic Multi-Dwelling Unit (MDU) contention scenarios.

J.1 Spatial and RF Environment Model

The simulation contrasts traditional wide-area coverage with Fi-Wi's localized orchestration.

Building & RF Assumptions:
  • Geometry: 10,000 sq ft floor divided into 8 units (~1,250 sq ft each). Metrics are normalized to "per 1,000 sq ft" for comparative analysis.
  • Path Loss Model: PL(d) = PL(d₀) + 10n log₁₀(d/d₀) + Xσ with n = 2.8.
  • OBSS Overlap: Autonomous case assumes 6 neighboring BSSIDs audible at ≥ -62 dBm.
  • Fi-Wi Isolation: 8 RRHs achieving >25 dB co-channel isolation through planned orthogonal reuse.

J.2 Contention and Backoff Logic

The simulation models 20 active stations (STAs) distributed across the 8-unit floor (average 2.5 STAs per unit). Service Time Variance (σ²) is calculated by observing the delay between TX_START and ACK_END across 10⁶ simulated TXOPs.

J.3 The ρ_LL Filtration Process

The Goodput Density is derived by filtering raw throughput through the 20ms p95 OWD constraint.

// Derivation for ρ_LL Calculation
for each packet i:
    delay_i = contention_delay + serialization_delay + retry_overhead
    if delay_i <= 20ms:
        accepted_payload += size_i
    else:
        dropped_from_goodput_metric++

ρ_LL = (accepted_payload) / (total_time * area)
  

J.3.1 Numerical Results and Derivation

The simulation produces the following goodput derivation for a 1,000 sq ft sections:

J.4 Traffic Model and Payload Composition

Traffic Type % of Load Constraint
Interactive (L4S/Gaming) 20% Strict SLO subject
Streaming (4K Video) 50% Freeze sensitive
Bulk (Background) 30% Throughput focused
↑ Contents