Timestamp-synchronized control loops, dynamic RF grouping, and multi-RRH
operation
Umber Networks Fi-Wi Technical Architecture Overview (Version 1.1,
December 2025)
Zebras look like horses, but they are not the same... Zebras, despite man's best efforts, cannot be tamed. The Wi-Fi we have engineered today remains fundamentally a collection of autonomous, uncoordinated things—zebras that simply cannot be harnessed.
Fi-Wi is architected from the ground up to be controllable, coordinated, and directed — the horse we need for in-building communications and sensing. As latency demands tighten and building densities increase, Fi-Wi isn't just a better future; it's the future we can build today.
The material presented in this document describes the Fi-Wi architecture and associated engineering concepts. It is provided "as is" for discussion and exploratory design purposes only. Nothing in this document constitutes a formal specification, performance guarantee, regulatory assertion, or commitment to implement any feature described.
Several sections use simplified or idealized assumptions to illustrate architectural differences between Wi-Fi, Multi-Link Operation (MLO), Low Latency Low Loss Scalable throughput (L4S), and Fi-Wi queueing and scheduling behavior. These examples are intended to clarify concepts rather than fully model the non-linear and stochastic dynamics present in operational wireless systems.
Real system behavior depends on hardware characteristics, RF topology, firmware behavior, congestion patterns, environmental conditions, and interactions with legacy Wi-Fi devices. Actual performance may differ from the representative models and examples described here.
Important Note on Capabilities: This document describes an architecture using Commercial Off-The-Shelf (COTS) Wi-Fi chipsets. The system provides dynamic point selection, intelligent frequency reuse, and centralized MAC scheduling. It does not provide RF phase control, distributed MIMO, or coordinated simultaneous transmission—capabilities that would require custom ASIC development. All described features are achievable with commodity Wi-Fi hardware and comply with unlicensed spectrum regulations.
Low Latency, Low Loss, Scalable Throughput (L4S) is a suite of IETF standards that extend the Internet's congestion control mechanisms through Explicit Congestion Notification (ECN) to support very low queuing delays. L4S is a ratified protocol stack with multiple production implementations.
Fi-Wi is architected specifically to provide the deterministic underlying transport required to satisfy the strict queuing mandates defined in these standards.
L4S replaces capacity-seeking behavior (Reno/Cubic) with pacing-based rate control. It is currently deployed in production environments including:
"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." — Antoine de Saint-Exupéry
"Everything should be made as simple as possible, but not simpler." — Albert Einstein
With 23.3 billion Wi-Fi devices in use worldwide and 5.5 billion people depending on internet connectivity, and growing, Wi-Fi has become the primary way we access the internet. So much so many people think Wi-Fi is the internet. It's how a home healthcare worker video-calls to check on a patient, or a cancer patient connects to their support group. It’s how a parent works remotely while their child attends school online, and how lifelong learners access the information they need to grow. It’s how a grandmother monitors her heart condition through a telehealth app. It’s how a family member finds their next job, or how a neighbor orders a meal.
Running quietly in the background are autonomous systems we've come to depend on: security cameras that alert us to threats, medical monitors that track vital signs, smart home systems that manage climate and safety, IoT sensors that detect water leaks or carbon monoxide. These systems don't wait for us to notice problems—they operate continuously, silently, keeping people safe.
We've moved far beyond entertainment and convenience. Wi-Fi now carries the infrastructure of daily survival. When it breaks down under density or congestion, it's not just buffering that fails. It's jobs, healthcare access, human connection, and the life-safety systems we trust to work when we're not watching. The $4.9 trillion Wi-Fi contributes to the global economy isn't an abstract number. It's the cumulative value of billions of human activities and critical systems that simply stop working when the network fails.
The infrastructure supporting all of this is failing at scale, and it must be addressed for all. The industry is moving toward L4S and ECN-based control to eliminate bufferbloat, but traditional Wi-Fi makes this impossible. Legacy congestion-control loops fail by design once a single flow saturates the bottleneck queue, and even modern ECN-based systems such as L4S cannot converge when Wi-Fi hides queue depth, induces collision storms, injects firmware-created delays that look like queues, and constantly shifts transmission (PHY) rates through its rate-control and aggregation machinery. Mesh networks and more APs catalyze intolerable user experiences by injecting more uncoordinated radios into an already chaotic RF environment. And because the AP industry understands these limits, it is no surprise that even major vendors publicly state that L4S cannot operate correctly over the products they sell.
Adding more Ethernet-attached APs makes it worse by creating more overlapping contention domains. Hidden queues in SoCs, rate-control firmware, and aggregation pipelines obscure the true bottleneck. In control-theory terms: the bottleneck queue cannot expose its state, the PHY rate is not stationary, and the closed loop cannot stabilize. This is why user experience fails in many apartments and homes, in hotels, MDUs, stadiums, and high-density buildings long before “capacity” is reached.
QoS cannot rescue this architecture. Because the bottleneck queue inside a Wi-Fi AP has no information about actual flow urgency or priority, no QoS mechanism can operate meaningfully. The only real solution is to avoid congestion altogether — which is exactly what L4S researchers have designed for and exactly what Fi-Wi supports.
While the protocol fails in the air, the physical infrastructure fails in the walls - the industry’s traditional answer of running copper Ethernet to APs — simply extends the lifetime of an architecture that has reached its limits. Copper requires periodic rip-and-replace cycles: Cat5 becomes Cat6, then Cat7, then Cat8. A home builder has no idea what communications wiring to install. The RJ45 connector and its plastic tab is fragile, outdated and end of life. And at 25G, 40G, or 100G, physics takes over: copper loses signal in dB per inch. Data centers have abandoned structured cabling (long-run copper) for core transport, restricting copper only to short-reach intra-rack DACs. Fi-Wi applies this same logic to the building: Fiber for the long haul (halls/walls), radio for the short hop.
Fi-Wi breaks the cycle. Install fiber once — and never revisit behind walls or ceilings again. The glass is permanent; only the optics evolve. Fiber is already the universal medium for 100G/400G data centers, DWDM long-haul transport, and now PCIe throughout a building with Fi-Wi. Remote Radio Heads simply convert between fiber and 802.11, eliminating embedded routing, rate-control SoCs, switching silicon, and the security-patch treadmill they require. When Wi-Fi standards evolve, you replace the small radio module(s) — that's all.
Fi-Wi turns fiber combined with 802.11 into the permanent, predictable, control-theory-friendly transport that the L4S control loop requires, and treats 802.11 radio heads as the small, disposable, last-meters, connector-free interface where the in-building network behaves deterministically. And because fiber increases the long-term value of a building, the investment is not just technically durable — it is financially durable.
There is no law of physics that says Wi-Fi cannot work at scale. The collapse we're seeing in apartments, hotels, and high-density buildings isn't inevitable. The researchers have shown engineers how to proceed. We know how to build stable control loops. We know how to coordinate radios. We know how to deploy permanent infrastructure.
The conditions for solving this are here, now. Engineering talent exists across our industry. The market has already validated the foundation: China's FTTR deployments have installed fiber to millions of rooms, proving that permanent infrastructure at this scale is not just feasible—it's already happening at volume. What's missing is capital directed at the right architecture. Investors are essential to this challenge. Their capital will enable the engineering to serve the market. And, once proven, market signals will sustain the development, directing human resources toward building what humanity needs for continued advancement.
Fi-Wi is Umber's answer, but the underlying challenge belongs to all of us. The 5.5 billion people depending on this infrastructure deserve better than a system designed for convenience that we've repurposed for survival. This is solvable engineering—the talent is ready, the manufacturing exists, and the market is waiting. It's time we came together and fixed this.
The failure of modern Wi-Fi to support low-latency applications (L4S) is not a failure of bandwidth; it is a failure of control. With 23.5 billion Wi-Fi devices deployed globally, the protocol has hit an asymptotic limit where adding complexity yields diminishing returns.
As density rises, autonomous contention scales super-linearly—effectively operating as the inverse of Metcalfe's Law. The result is a rising noise floor and media access collisions that render unlicensed spectrum unusable for the deterministic performance required by next-generation applications.
Evolutionary engineering is powerful; it gave us twenty-five years of Wi-Fi speed improvements. But every evolutionary curve eventually hits an asymptote—a point where adding more complexity yields diminishing returns. We have reached that point.
"The IEEE 802.11 working group behaves like a composer writing a symphony that effectively cannot be played. They continually add instruments—4096-QAM, Puncturing, MLO—without considering that the musician (the silicon) has only microseconds to react."
The decision matrix for a Wi-Fi chip has exploded combinatorially. We can trace this through the Modulation and Coding Scheme (MCS) Table:
The Physical Trap: When the firmware engineer fails to optimize the radio, can we simply redesign the chip? No, because of RTL (Register Transfer Level) Accretion. In software, engineers "refactor" unwieldy code. In hardware, refactoring is economically forbidden. A complex SoC takes 18–24 months to validate; removing "dead" logic risks breaking obscure corner cases. Consequently, vendors only add; they never subtract. 802.11be logic wraps around 802.11ax logic, which wraps around 802.11ac logic—twenty-five years of accumulated technical debt consuming area and leakage power.
The Market Signal: The ultimate proof that the standard has reached gridlock is the behavior of market leaders like Samsung and Apple. They no longer rush to support every new feature—they aggressively whitelist features and blacklist others because complexity drains battery and destabilizes connections. When the two largest consumers of wireless silicon effectively stop buying the complexity argument, the evolutionary roadmap is broken.
The fundamental instability of 802.11 stems from the Birthday Paradox applied to media access. In an autonomous system, as the number of contending stations (n) increases linearly, the probability of collision increases combinatorially:
Simulation data confirms that even with moderate client density, collision probability quickly exceeds 50%, forcing the network into a state of "Drift" where latency becomes unbounded. Under these conditions, the network is no longer constrained by PHY capacity, but by the probability of successful media access.
This is Metcalfe's Law in reverse: instead of each new node increasing the value of the network, each new node increases the chance of interference and reduces usable capacity.
The collapse of the operator model is driven by three distinct architectural failures inherent to the 802.11 standard.
Standard Wi-Fi relies on Carrier Sense Multiple Access (CSMA), which assumes that all stations can hear each other. In real-world MDU (Multi-Dwelling Unit) environments, this assumption fails catastrophically.
Field measurements using ESP32-based sensors reveal that hidden node contention consumes 30-50% of available airtime in typical MDU deployments—airtime paid for in spectrum acquisition costs but lost to protocol overhead invisible to traditional monitoring. This represents a massive protocol tax where significant airtime is consumed by retries and backoff slots rather than payload delivery.
The most critical failure for a network operator is the loss of state control. Modern 802.11ax supports 12 MCS indices × 4 bandwidth options × 8 spatial stream configurations × 3 guard intervals = >1,000 valid PHY states. Autonomous rate selection must navigate this space at sub-millisecond timescales under non-stationary noise.
This creates a Non-Stationary System:
Because Wi-Fi is non-stationary, autonomous rate selection under contention has no bounded outcome. The IEEE 802.11 standard has allowed the MCS table to explode into hundreds of valid permutations—a chaotic state space that firmware must navigate in microseconds with incomplete information.
As load increases, the spatial precision of the network degrades. Mathematical modeling shows that the condition number (κ)—a measure of how well-conditioned the MIMO channel matrix is—degrades from 6 dB (excellent spatial separation) to >12 dB (severe interference) under load. This collapse means that 4×4 MIMO effectively degrades to 2×2 or worse, turning additional spatial streams into self-interference rather than capacity.
This degradation collapses the theoretical gains of Mu-MIMO, transforming high-order spatial streams into interference rather than usable capacity. The "Efficiency Paradox" emerges: Wi-Fi evolution has focused on shrinking Payload Duration (faster PHY rates like 4096-QAM) while MAC Overhead (LBT, Backoff, Preamble) remains constant. To amortize the overhead, chips must build massive Aggregates (A-MPDUs). This destroys latency. We have engineered a Ferrari engine (the PHY) inside a garbage truck (the MAC).
For network operators—whether cable MSOs, telcos, or fiber providers—this architectural chaos presents a fundamental business risk: You own the customer experience, but not the air interface.
Traditional attempts to solve Wi-Fi density problems fail because they address symptoms rather than the underlying architectural failure:
The Trillion-Dollar Context: The mobile industry spent $600 billion building 5G to get scheduled, deterministic performance outdoors. They understand that unlicensed spectrum + autonomous contention = chaos. The genius of 5G is its architecture; its Achilles heel is its cost. In recent auctions, 20 MHz of licensed mid-band spectrum sold for over $17 billion for U.S. rights alone.
Fi-Wi applies the cellular C-RAN architecture indoors—but on unlicensed spectrum that costs nothing. This is the arbitrage opportunity.
The architectural reset is not limited to the infrastructure; it fundamentally alters the behavior of the Station (STA). In legacy Wi-Fi, the STA is an autonomous agent that fights for upstream airtime using EDCA (Enhanced Distributed Channel Access). It maintains its own local WMM queues and blindly transmits whenever it wins a contention window, often oblivious to the fact that the AP's receive buffer is already full.
The L4S Inversion: With L4S, the "Quality of Service" decision moves from the Wi-Fi card's firmware to the application's congestion control algorithm. We replace the rigid, static categories of WMM with the dynamic, adaptive responsiveness of TCP Prague and other L4S-compliant congestion controls.
Eliminating the "Uplink Queue": This effectively virtualizes the queue. Instead of a deep buffer sitting on the Wi-Fi chip waiting to be transmitted, the packets are held in user-space memory on the client device, waiting for the "go" signal (or rather, the absence of a "stop" signal). The traffic never enters the contention domain until there is guaranteed capacity to service it. The STA no longer needs complex internal QoS schedulers because it is no longer trying to force more data than the pipe can hold.
In legacy systems, flow control happens at the driver level. When the Wi-Fi card's hardware buffer fills up (the TX Ring), it signals the Operating System to "Stop the Queue." The OS then buffers packets in software (qdisc) until the hardware signals "Go."
This is catastrophic for latency. It creates a hidden reservoir of old data sitting in the kernel, waiting for the hardware to clear. By the time the hardware is ready, the packets in the OS queue are already stale.
L4S eliminates this layer of buffering entirely. Because TCP Prague adjusts the send rate to match the actual airtime capacity (signaled via ECN), the application never sends enough data to fill the hardware ring buffer. The driver never has to assert flow control, the OS queue remains empty, and every packet that hits the driver is fresh, ensuring immediate transmission.
Solving this requires a "Subtractive Architecture." Instead of adding more features to the radio, we must remove them. The architectural breakthrough of Fi-Wi is decoupling the MCS State Graph described in Section 2.3.2 into its constituent parts:
MCS_INDEX, N_SS, TXOP_DURATION)
into the precise RF waveform without autonomous deliberation.
This architectural shift—from distributed chaos to centralized control—mirrors the evolution from analog transmission systems (noise-prone, operator-invisible) to digital QAM (deterministic, monitorable). Fi-Wi completes this transformation for the last 10 meters, moving the network from a model of probabilistic negotiation to one of deterministic execution.
Section 13 describes the Concentrator's scheduling algorithm that implements this graph traversal, while Appendix C details the RRH's scatter-gather DMA mechanism that executes the chosen state transitions at microsecond timescales.
Traditional QoS mechanisms in Wi-Fi—WMM access categories, priority queues, and traffic shaping—reflect a fundamental architectural flaw: treating contention as inevitable and attempting to optimize it through priority classes. This approach attempts to infer urgency by classifying packets, then granting probabilistic access to the medium—essentially rolling dice with weighted odds.
L4S changes the premise entirely. Flows signal their tolerance for delay using ECN, allowing the network to signal sources to control their own send rates. Across many flows, this controls the aggregate arrival rates at the forwarding plane based on real-time queue feedback rather than static classes.
In a Fi-Wi architecture, where all wireless transmissions are centrally scheduled with unified state, traffic no longer competes through contention. The Concentrator controls arrival rates to each Remote Radio Head, ensuring packets are transmitted at the precise moment they are needed. This deterministic scheduling replaces the probabilistic contention that WMM attempts to optimize. Consequently, the complex web of traditional QoS queues is rendered obsolete; we replace "Priority" (deciding who waits) with "Isolation" (ensuring no one waits).
The following interactive simulation demonstrates the architectural differences between Fi-Wi, autonomous APs, and mesh networks under varying load conditions. It visualizes the MCS State Graph discussed in Section 2.7, showing how autonomous systems fail to navigate this state space under density.
Each "room" represents a device with a 4 × 12 grid of MCS states (4 spatial streams × 12 MCS indices). The ghost node (dashed) shows the ideal state based on channel quality, while the active node shows the actual state selected by the rate control algorithm.
Quick Start - Try These Scenarios:
Interactive Controls:
What to Watch For:
MCS Grid: Each 4×12 grid shows all possible MCS states. Top rows = Mu-MIMO (multi-user), bottom rows = standard 2×2 MIMO. Columns = MCS index (0-11, higher = faster but needs better SNR).
Eigenvalues (λ₁, λ₂): Strength of spatial modes in the MIMO channel. As density increases in autonomous mode, λ₂ collapses → spatial interference.
Condition Number (κ): Ratio λ₁/λ₂ in dB. Low (~6 dB) = good. High (>12 dB) = Mu-MIMO degraded to single-stream. This directly demonstrates the "Spatial Contention Cascade" from Section 2.3.3.
Collision Probability: Computed using Birthday Paradox formula: n(n-1)/2 collision pairs. When this exceeds 50%, the network enters "Drift" state with unbounded latency.
This visualization proves the loss of control described in Section 2.4. In autonomous mode, operators cannot engineer performance because the system navigates a 1,000+ state MCS graph with no global coordination.
In Fi-Wi mode, the Concentrator's global state visibility allows it to:
The result: predictable, engineerable performance that scales with density instead of collapsing. The difference becomes visceral when you watch autonomous mode turn red under the same load that Fi-Wi handles in green.
┌────────────────────────────────────────────┐
│ Fi-Wi Concentrator │
│────────────────────────────────────────────│
L4S/ECN-aware │ │
traffic from LAN/ │ ┌────────────────────────────────────┐ │
WAN (IP/802.3) ─────┼─▶│ Central Packet Memory & Queues │ │
│ │ • Per-flow / per-tenant queues │ │
│ │ • Per-airtime-domain queues │ │
│ │ • Enqueue timestamps (µs) │ │
│ └───────────────┬────────────────────┘ │
│ │ │
│ ┌───────────────▼────────────────────┐ │
│ │ L4S/AQM & Scheduler │ │
│ │ • Sojourn-time based ECN marking │ │
│ │ • TXOP length control (≈250 µs) │ │
│ │ • RF grouping & spatial streams │ │
│ └───────────────┬────────────────────┘ │
│ │ PCIe over fiber │
└───────────────────┼────────────────────────┘
│
┌───────────────────────────────────┼───────────────────────────────────┐
│ │ │
│ │ │
┌───────▼─────────┐ ┌────────▼─────────┐ ┌────────▼─────────┐
│ RRH #1 │ │ RRH #2 │ │ RRH #3 │
│ (Thin MAC/PHY) │ │ (Thin MAC/PHY) │ │ (Thin MAC/PHY) │
│ • RF front end │ │ • RF front end │ │ • RF front end │
│ • DFE + FFT │ │ • DFE + FFT │ │ • DFE + FFT │
│ • Minimal MAC │ │ • Minimal MAC │ │ • Minimal MAC │
│ • DMA engine │ │ • DMA engine │ │ • DMA engine │
│ • PTP sync │ │ • PTP sync │ │ • PTP sync │
└───────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘
│ │ │
│ │ │
│ PCIe-over-fiber links (no deep queues in RRHs) │
│ │ │
│ │ │
┌───────▼─────────┐ ┌────────▼────────┐ ┌────────▼─────────┐
│ RRH #4 │ ... │ RRH #N │ │ Wi-Fi STAs │
│ (Thin MAC/PHY) │ │ (Thin MAC/PHY) │ (Rooms, AP-like cells, clients)│
│ • RF front end │ │ • RF front end │ │ • Phones │
│ • DFE + FFT │ │ • DFE + FFT │ │ • Laptops │
│ • Minimal MAC │ │ • Minimal MAC │ │ • IoT devices │
│ • DMA engine │ │ • DMA engine │ │ │
│ • PTP sync │ │ • PTP sync │ │ │
└─────────────────┘ └─────────────────┘ └──────────────────┘
Key properties: Central packet memory and queues live entirely in the concentrator, where L4S-aware AQM and scheduling operate on true bottleneck queues. RRHs are kept as simple hardware endpoints (RF + minimal MAC + DMA + PTP), with no deep local buffering or autonomous AP logic. This enables stable L4S behavior, explicit TXOP control, and software-defined evolution of queueing and RF policies.
To understand Fi-Wi, we must first unlearn the definition of an "Access Point."
In a typical controller-managed enterprise Wi-Fi deployment, a centralized controller (e.g., Cisco WLC, Aruba Mobility Controller, Ubiquiti UniFi Controller) coordinates AP configuration: channel assignment, transmit power, client steering recommendations, and SSID management. However, each AP remains autonomous at the data plane:
These systems are loosely-coupled: the controller manages the control plane (configuration, policy) but the data plane — queuing, MAC scheduling, aggregation, and packet forwarding — remains distributed and autonomous across individual APs.
In Umber Fi-Wi (C-RAN for Wi-Fi), we split the AP and cellularize the RF domain, down to room-level. The concentrator sees all flows, all queues, and all RRHs. The RRHs handle 802.11 MAC/PHY but are tightly time-synchronized and behave as DMA-driven PHY/MAC endpoints rather than autonomous APs. A set of RRHs and their shared queues form a cellularized Wi-Fi domain within the building, often at “cell per room” granularity.
Fi-Wi centralizes both control plane AND data plane with shared state across all RRHs. The concentrator doesn't just configure RRHs; it directly manages their queues, schedules their TXOPs, and maintains unified timestamp-synchronized state across the entire cellularized RF domain.
Conceptually, Fi-Wi decouples the system into two nested feedback loops, separated by timescale:
The Outer Loop manages congestion and end-to-end latency (Internet speed). The Inner Loop manages MAC efficiency and radio timing (Airtime).
The Problem with Legacy Wi-Fi: Traditional APs couple these loops unpredictably, creating "sawtooth" latency patterns that confuse TCP.
The Fi-Wi Solution: By centralizing both loops in the Concentrator, Fi-Wi enforces a strict Time-Scale Separation. The Inner Loop runs so fast (3–5 kHz) that it appears as "constant service" to the slower Outer Loop (10–20 Hz), allowing L4S to stabilize perfectly.
(See Section 5: Control Architecture for the rigorous control-theoretic analysis and stability criteria.)
Fi-Wi operates across two distinct time domains simultaneously. The first is the concentrator's internal master clock, disciplined via PTP/802.1AS over the PCIe fronthaul (detailed in Section 4.7). The second is the 802.11 TSF (Target Sync Function) domain that 802.11 clients use to coordinate with the MAC layer. In a traditional AP these two clocks are decoupled — the AP runs one TSF and one clock. In Fi-Wi, with 24 RRHs each presenting a TSF-aware BSS, managing the relationship between them is a foundational architectural responsibility of the concentrator.
The concentrator synchronizes its master clock to all attached RRHs on the order of microseconds (and substantially tighter when using PCIe-native timing mechanisms such as PTM — see Section 4.7 for the full hardware chain). This master clock gives every packet:
This clock lives entirely inside the Fi-Wi domain. Clients never see it directly. It is the coordinate system in which shim header timestamps (Section 4.2), AQM marking decisions (Section 4.3), and the ML training corpus (Section 15) are all expressed. Because all packet timestamps, service events, and queue measurements are expressed in this single master time domain, Fi-Wi can compute precise per-packet sojourn times independent of the TSF domain, enabling stable ECN marking and L4S control across the system.
The 802.11 TSF is a 64-bit microsecond counter that every client associates with a BSS. Clients set their local TSF from beacons. They use it to wake from power save at the right moment, to interpret TBTT (Target Beacon Transmission Time), and to coordinate TXOP timing. The TSF is the only MAC-visible clock the 802.11 standard exposes at the MAC layer.
In a traditional single-AP deployment this is trivial: one AP, one TSF, one beacon stream. In Fi-Wi it is not. Consider a client in a room served by two RRHs in the same airtime domain. That client will receive beacons from both RRHs. If those beacons carry inconsistent TSF values, even small inconsistencies can lead to misaligned power-save wakeups, ambiguous TBTT interpretation, and in some implementations degraded performance or reassociation. The coherence of the TSF domain across all RRHs in a BSS is not optional; it is a hard correctness requirement.
Fi-Wi satisfies this requirement by construction: the concentrator generates all beacon frames. No RRH constructs its own beacon. The concentrator writes the TSF value into every beacon before dispatching it to the appropriate RRH for transmission. Because all TSF values originate from the same source and are derived from the same master clock, they are consistent by design rather than by coordination protocol. Within a given BSS, TSF values are identical across all participating RRHs; multiple TSF domains arise only when multiple BSS instances are present.
The concentrator maintains 25 simultaneous time references: its own PTP-disciplined master clock and one 802.11 TSF per RRH. Each TSF has its own epoch (established at BSS creation) and its own drift correction term, derived from periodic synchronization updates over the fronthaul (PTP/802.1AS or PCIe PTM), which bound long-term drift. The concentrator knows the exact affine mapping between the master clock and every client-visible TSF domain at all times:
TSF_i(t) = (t_master - epoch_i) + drift_correction_i(t)
Any event — a packet enqueue, an ECN mark, a TXOP start, a beacon transmission — can be expressed in any of the 25 frames without loss of precision. This is the time-domain analog of a coordinate transformation: the concentrator is the origin from which all other reference frames are derived, and any event timestamp can be mapped between frames via a known, invertible affine transform, updated continuously via the fronthaul synchronization loop.
Concentrator master clock (PTP-disciplined)
│
├─ Master frame: all shim timestamps, sojourn times, AQM marks, ML labels
│
├─ TSF_1: epoch_1, drift_1(t) → beacon stream for RRH 1 ┐
├─ TSF_2: epoch_2, drift_2(t) → beacon stream for RRH 2 │ identical within
├─ TSF_3: epoch_3, drift_3(t) → beacon stream for RRH 3 │ a given BSS
│ ... ┘
└─ TSF_24: epoch_24, drift_24(t) → beacon stream for RRH 24
Any event E has coordinates in all 25 frames simultaneously.
Mapping between any two frames: affine transform, known at the concentrator,
updated continuously via the fronthaul sync loop.
The concentrator as the origin of 25 simultaneous time reference frames (for a 24-RRH deployment). Client-visible TSF domains are derived from the master clock via known affine transforms. Within a BSS, TSF values are identical across participating RRHs.
In a controller-managed AP deployment, each AP runs its own TSF independently. The controller can nudge APs toward a common time reference via 802.11v BSS Transition Management or out-of-band NTP, but it does not generate beacon frames — each AP does. This means TSF values across APs can diverge by the inter-AP sync error (typically tens to hundreds of microseconds with Ethernet-based PTP, more without it).
A client roaming between two such APs may see a TSF discontinuity at handoff. Power-save state, TBTT alignment, and any MAC-layer timing assumption the client holds must be renegotiated. In Fi-Wi, roaming between RRHs within the same concentrator domain is a TSF-transparent event: the client's TSF counter simply continues, because the new RRH's beacon carries the same TSF value the old one would have carried at that moment. The client does not know a handoff occurred at the MAC layer.
This unified time model also enables the concentrator to schedule transmissions across RRHs against a single global timeline, rather than relying on independent per-RRH contention processes. TSF continuity across RRH handoffs is a direct consequence of centralized beacon generation, and it is what makes Fi-Wi's active redundancy claims in Section 8 operationally credible: per-packet steering between RRHs is transparent to clients because the client's MAC-layer time reference never changes. This unified time model enables not only precise measurement, but coordinated control of transmission behavior across RRHs, as described in Section 4.1.4.
The unified time model described above is not only a measurement framework; it is the foundation for Fi-Wi's centralized MAC scheduling. In conventional 802.11 deployments, EDCA (Enhanced Distributed Channel Access) operates as a stochastic contention mechanism: each AP independently selects random backoff values within its CWmin/CWmax range, and medium access emerges probabilistically.
In Fi-Wi, EDCA is not treated as a distributed random process. It is treated as a centrally orchestrated actuation layer, driven by the concentrator's master time reference.
Because the concentrator maintains:
it can shape medium access behavior across RRHs by dynamically controlling EDCA parameters on a per-radio basis. The key parameters are:
By assigning narrowly bounded contention windows and staggered AIFS values across RRHs, the concentrator can bias contention outcomes such that one RRH is overwhelmingly likely to win access at a given moment. Rotating these parameters over time creates a soft time-division multiplexing (TDM) effect using standard EDCA semantics.
This transformation is only possible because all RRHs share a common time reference. The concentrator can schedule EDCA parameter updates relative to the master clock and ensure that all RRHs apply them in a coordinated manner. Without this shared time base, independent EDCA processes would quickly decorrelate and revert to stochastic contention.
Conceptually, the concentrator executes a scheduling loop:
for each scheduling interval:
observe queue state across RRHs // centralized visibility
select next RRH (or RF group) to serve // queue-aware decision
assign EDCA parameters (CWmin, CWmax, AIFS, TXOP)
enforce timing relative to master clock // coordinated application
The result is not strict TDMA — 802.11 contention semantics are preserved and the system remains compliant with standard client behavior — but the distribution of outcomes is shaped by the concentrator. Over short time horizons, access becomes highly predictable and service intervals can be bounded. This has two critical consequences:
Because TSF values are consistent across RRHs, these scheduling decisions are MAC-transparent to clients. From the client's perspective, the network behaves as a single, coherent AP with stable timing characteristics, even as transmissions are steered across multiple physical radios.
Controller-based Wi-Fi systems can configure EDCA parameters on individual APs, but they cannot coordinate their application in time with sufficient precision. Each AP maintains its own clock, its own contention process, and its own transmit queues.
Without a shared time origin and centralized queue visibility, EDCA remains a probabilistic mechanism. Attempts to tune contention parameters across APs produce statistical bias at best, not deterministic scheduling. The lack of a unified time domain prevents coordinated rotation of access privileges across radios.
Fi-Wi's ability to treat EDCA as a controllable scheduling primitive is a direct consequence of the concentrator's role as both the time origin and the sole owner of transmit queues.
This time-driven EDCA orchestration is the mechanism by which Fi-Wi converts the inherently stochastic 802.11 MAC into a predictable, centrally scheduled system — completing the chain from time synchronization through queue observability to stable L4S control.
Between 802.3/IP and the fronthaul link we add a small internal metadata header. Conceptual form:
struct FiWiMeta {
uint64_t seq; // fronthaul sequence number
uint64_t t_ingress_us; // time packet enqueued into group queue (central DRAM)
uint32_t txop_id; // TXOP this MSDU is in
uint8_t mpdu_idx; // index within aggregate
uint8_t mpdu_cnt; // total MSDUs in this TXOP
uint8_t ecn_flags; // CE applied? which queue? reason bits
uint32_t qlen_pkts; // queue depth snapshot at TXOP start
};
This header is visible only inside the Fi-Wi domain. It lets us:
Td = now - t_mark per flow or per RF group
We choose the group queues in the concentrator—each corresponding to a cellularized airtime domain shared by one RRH or by multiple interfering RRHs—as the only places where deep queues are allowed and where we apply ECN:
Other queues (within RRH hardware, on the fiber/fronthaul link) are kept shallow via pacing and controlled descriptor posting. The group queues become the single bottlenecks in each cellularized airtime domain, which is exactly what L4S wants: a small number of stable, well-behaved bottlenecks with known behavior. The control policy is explicitly tuned to keep both average and tail queueing delay low.
The Standard AP Architecture: Traditional Wi-Fi chips already use DMA to move packets from host memory to the radio without CPU involvement. But they require a local CPU to create descriptors, manage buffers, and run the network stack. Every AP is a complete computer running millions of lines of Linux.
The Fi-Wi Innovation: DMA Over Distance (not RDMA)
Fi-Wi extends the PCIe bus over fiber, allowing the RRH's DMA engine to read and write remote memory in the Concentrator. To the RRH silicon, memory 100 meters away appears "local"—accessible with the same PCIe transactions a traditional Wi-Fi chip uses to access DRAM 10 millimeters away on the motherboard.
Result: The local CPU, local DRAM, and entire Linux stack can be eliminated. The RRH becomes a pure "micro-bridge"—just DMA + MAC/PHY logic.
The Silicon Cost Difference:
| Component | Traditional AP | Fi-Wi RRH |
|---|---|---|
|
MAC/PHY Silicon (802.11 Radio Logic) |
~15-20M gates MIMO, error correction, etc. Complexity dictated by physics |
~15-20M gates Same physics, same complexity No savings here |
|
Host SoC / CPU (The "Brains") |
~50-100M gates Multi-core ARM CPU DDR4 controller Peripherals, caches, etc. |
~100K-500K gates Simple DMA state machine Descriptor buffer only 100-1000x simpler |
| DRAM |
256MB - 1GB DDR4 (Required for OS + buffers) |
16-64KB SRAM (Descriptor storage only) |
| Operating System |
Linux (millions of LOC) Requires security patches |
None Zero software attack surface |
| Total Silicon | ~70-120M gates | ~15-20M gates |
Direct Implications:
The Economic Model:
Traditional Architecture: 50 APs = 50 CPUs, 50 DRAM modules, 50 power supplies, 50 Linux installations, 50 security update cycles.
Fi-Wi Architecture: 1 powerful Concentrator (workstation-class) + 50 simple RRHs (DMA + radio only).
Total system cost is lower because you're paying for intelligence once, not 50 times.
Why Incumbents Cannot Do This:
Traditional AP vendors have already optimized their SoC designs—the CPU, DRAM controller, and peripherals are as efficient as they can be. But their architecture requires these components at every radio because each AP operates autonomously. Even if they wanted to simplify, the distributed control model forces complexity at the edge.
Fi-Wi's centralized architecture enables the per-radio simplification. This is a structural cost advantage, not a manufacturing efficiency. Replicating it would require incumbents to abandon their entire product line and business model—a classic Innovator's Dilemma.
Bottom Line: C-RAN works because silicon economics favor centralized intelligence. The gate count difference isn't cosmetic—it's the foundation of Fi-Wi's cost, power, and reliability advantages.
In Fi-Wi, packet memory is centralized in the concentrator:
This design:
Because the Fi-Wi concentrator maintains shared state for the entire RF domain, it can directly control the RF footprint of each RRH by adjusting per-RRH beacon transmit power. This alters:
Beacon power is one of the most effective tools for dynamic RF cell shaping because it affects STA association and roaming decisions without modifying data-plane PHY rates. By lowering beacon power at certain RRHs and raising it at others, the concentrator can:
Traditional controller+AP systems attempt similar behavior but lack true shared state because each AP maintains its own queueing and PHY decisions. In Fi-Wi, beacon shaping is coordinated with:
This makes beacon power a first-class control variable in defining and stabilizing the boundaries of each cellularized RF domain.
The Fi-Wi architecture requires deterministic, low-latency fronthaul links between the concentrator and RRHs. Because RRHs function as DMA engines accessing centralized packet memory (Section 4.4), Umber's implementation uses PCIe (PCI Express) over fiber rather than Ethernet. This section quantifies bandwidth, latency, and jitter requirements, and demonstrates that PCIe over fiber not only meets these requirements but provides superior performance compared to network-based alternatives.
The choice of PCIe over fiber instead of Ethernet is driven by the Fi-Wi architectural model:
RRHs as DMA engines: Each RRH directly reads packet descriptors from concentrator DRAM, fetches packet data, and writes received packets back to memory. This is native PCIe behavior—exactly how a network card or storage controller operates.
Latency advantage: PCIe avoids the network stack entirely:
Determinism: PCIe provides guaranteed bandwidth allocation and predictable latency through:
Simplicity: The RRH sees the concentrator's memory space directly. No protocol translation, no socket APIs, no network configuration.
Each RRH requires bandwidth for:
1. Downlink packet DMA (concentrator → RRH)
For an RRH serving one or more STAs with aggregate capacity Ceff:
BWDL = Ceff · (1 + OHdesc) (4.1)
where OHdesc accounts for DMA descriptors, metadata, and PCIe TLP (Transaction Layer Packet) overhead (typically 10-20%).
Example: For Ceff = 600 Mbps (typical 802.11ax 2×2 MIMO) with OHdesc = 0.15:
BWDL = 600 · 1.15 = 690 Mbps
2. Uplink packet DMA (RRH → concentrator)
Typically symmetric or slightly higher than downlink due to ACKs and control frames:
BWUL ≈ BWDL · 1.1 ≈ 760 Mbps (4.2)
3. CSI and status updates
Channel State Information and MAC statistics are written to concentrator memory via PCIe:
BWCSI = Nsta · Nsc · Ntx · Nrx · Bsample · fCSI (4.3)
For Nsta=4, Nsc=234, Ntx=2, Nrx=2, Bsample=24 bits, fCSI=50 Hz:
BWCSI = 4.49 Mbps per RRH
4. Control and command traffic (concentrator → RRH)
Configuration updates, timing sync corrections, power/channel commands:
BWcontrol ≈ 1-5 Mbps per RRH (4.4)
Total bidirectional bandwidth per RRH:
BWtotal = BWDL + BWUL + BWCSI + BWcontrol (4.5) BWtotal ≈ 690 + 760 + 4.5 + 2 = 1456 Mbps ≈ 1.5 Gbps
PCIe bandwidth is determined by generation and lane count:
| PCIe Gen | Per-Lane Rate | x1 Link | x4 Link | x8 Link |
|---|---|---|---|---|
| Gen 3 | ~8 GT/s |
~985 MB/s (7.88 Gbps) |
~3.94 GB/s (31.5 Gbps) |
~7.88 GB/s (63 Gbps) |
| Gen 4 | ~16 GT/s |
~1.97 GB/s (15.75 Gbps) |
~7.88 GB/s (63 Gbps) |
~15.75 GB/s (126 Gbps) |
| Gen 5 | ~32 GT/s |
~3.94 GB/s (31.5 Gbps) |
~15.75 GB/s (126 Gbps) |
~31.5 GB/s (252 Gbps) |
Note: Effective bandwidth accounts for 128b/130b encoding (Gen 3+) and protocol overhead.
RRH link sizing: For 1.5 Gbps per RRH requirement:
A single PCIe Gen 3 x1 lane is sufficient per RRH with substantial headroom.
The concentrator must aggregate multiple RRH connections. Consider a 50-RRH deployment:
Total aggregate bandwidth requirement:
BWaggregate = NRRH · BWtotal (4.6) BWaggregate = 50 · 1.5 Gbps = 75 Gbps (peak)
With 40% average utilization (typical for building-wide traffic):
BWtypical = 75 · 0.40 = 30 Gbps
Architecture Options:
Option 1: PCIe switch fabric
Option 2: Multi-host server (Dual Socket)
Standard PCIe uses copper traces on motherboards (limited to ~30cm at Gen 3/4 speeds). To reach RRHs distributed throughout a building, PCIe signals are carried over fiber using optical transceivers.
Technologies:
1. Active Optical Cables (AOC)
2. Optical PCIe adapter cards
3. PCIe fabric extenders
Recommended approach for Fi-Wi: Optical PCIe adapter cards with standard fiber infrastructure, providing flexibility and leveraging commodity fiber installation.
PCIe over fiber latency components:
| Component | Latency |
|---|---|
| PCIe TLP formation (concentrator) | 0.2-0.5 µs |
| Optical transceiver (TX) | 0.1-0.3 µs |
| Fiber propagation (100m) | 0.5 µs |
| Optical transceiver (RX) | 0.1-0.3 µs |
| PCIe TLP processing (RRH) | 0.2-0.5 µs |
| PCIe switch (if used) | 0.1-0.3 µs per hop |
| Total one-way | 1.2-2.4 µs |
| Round-trip (DMA read) | 2.4-4.8 µs |
Comparison to Ethernet:
| Fronthaul Type | Round-Trip Latency | Determinism |
|---|---|---|
| PCIe over fiber | 2.4-4.8 µs | Excellent (credit-based) |
| 10GbE (cut-through) | 10-30 µs | Good (with QoS) |
| 10GbE (store-forward) | 20-100 µs | Fair (subject to congestion) |
PCIe over fiber provides 5-10× lower latency than even optimized Ethernet, which is critical for the inner control loop (Section B) operating at 200-500 µs timescales.
PCIe's credit-based flow control eliminates congestion drops and provides deterministic latency:
Measured jitter: PCIe over fiber typically exhibits <50 ns jitter, well under the 200 ns budget for 1 µs time synchronization (Section 4.1).
This determinism is impossible to achieve with Ethernet without time-sensitive networking (TSN) extensions, which add complexity and cost.
PCIe over fiber distance depends on optical budget and signal integrity:
| PCIe Gen | Multi-Mode Fiber | Single-Mode Fiber |
|---|---|---|
| Gen 3 (8 GT/s) | 300 m | 10 km |
| Gen 4 (16 GT/s) | 100 m | 2-10 km |
| Gen 5 (32 GT/s) | 50-100 m | 2 km |
Fi-Wi requirement: Building-scale deployments require ≤100 m reach, easily achieved with Gen 3/4 over multi-mode fiber or any generation over single-mode fiber.
PCIe over fiber cost per RRH:
| Component | Cost (approx.) |
|---|---|
| RRH-side PCIe optical adapter | $150-300 |
| Fiber pair (50m installed) | $50-100 |
| Optical transceiver pair | $50-100 |
| PCIe switch port allocation | $100-200 |
| Total per RRH | $350-700 |
Comparison to network alternatives:
| Approach | Cost per RRH | Latency | Determinism |
|---|---|---|---|
| PCIe over fiber | $350-700 | 2-5 µs | Excellent |
| 10GbE + TSN | $300-600 | 10-30 µs | Good |
| Standard 10GbE | $200-400 | 20-100 µs | Fair |
PCIe over fiber costs moderately more than standard Ethernet but delivers 5-10× better latency and superior determinism. For Fi-Wi's DMA-based architecture, this cost is justified by the performance and architectural simplicity gains.
For context: a typical enterprise AP costs $500-2000, and a cellular small cell costs $1000-5000. The fronthaul cost is comparable to or less than the radio cost difference, making it economically viable.
For deployments where PCIe over fiber infrastructure is unavailable, a hybrid approach is possible:
This reduces PCIe bandwidth requirements (only packet data, not CSI/control) and allows leveraging existing Ethernet infrastructure for non-latency-critical traffic.
However, the pure PCIe approach is architecturally cleaner and avoids the complexity of dual-protocol RRH implementation.
For context, cellular systems use:
CPRI (Common Public Radio Interface):
eCPRI (Enhanced CPRI) / Fronthaul Gateway:
Fi-Wi (PCIe over fiber):
Fi-Wi's functional split and PCIe transport provides a unique balance: lower bandwidth than CPRI, lower latency than eCPRI, and native integration with the DMA-based architecture.
| Requirement | Target | Achieved with PCIe Gen 3 x1 |
|---|---|---|
| Bandwidth per RRH | ~1.5 Gbps | ✓ 7.88 Gbps (5× margin) |
| Aggregate (50 RRH) | ~30 Gbps avg | ✓ PCIe switch or multi-CPU |
| Round-trip latency | <10 µs | ✓ 2.4-4.8 µs |
| Jitter | <200 ns | ✓ <50 ns (credit-based) |
| Distance | ≤100 m | ✓ 300m MM / 10km SM |
| Determinism | No drops, predictable | ✓ Credit-based flow control |
| Cost per RRH | <$700 | ✓ $350-700 |
Why PCIe over fiber is the right choice for Fi-Wi:
The deterministic, sub-5-microsecond fronthaul is what enables Fi-Wi's centralized control, time synchronization, and single-bottleneck queueing architecture. Unlike Wi-Fi mesh, controller-based systems with over-the-air backhaul, or even Ethernet-based approaches, PCIe over fiber provides the predictable substrate needed for the control loops described in Appendices A and B to operate with the precision required for sub-millisecond tail latency control.
The "cellularization" of Wi-Fi relies on a unified timebase. In the Fi-Wi architecture, time is not merely used for logging; it is a control variable. To achieve coordinated scheduling, accurate queue measurements, and seamless mobility, every RRH must share a precise understanding of "now" down to the microsecond level.
To achieve this, Fi-Wi establishes a strict Hierarchical Clock Tree over the PCIe fronthaul, leveraging the native determinism of the bus rather than the best-effort nature of packet switching.
The Fi-Wi Concentrator acts as the PTP Grandmaster (IEEE 1588v2 / 802.1AS) for the entire building. It houses the primary reference oscillator (typically a high-stability OCXO).
External Reference (Optional GPS/GNSS)
│
▼
┌──────────────────────────────────────────────┐
│ Fi-Wi Concentrator │
│ [ High-Stability Ocillator (OCXO) ] │ ◄── Grandmaster (GM)
│ (System Timebase t0) │
└──────────────────┬───────────────────────────┘
│ PCIe PTM / Hardware Sync
│ (Compensates for fiber flight time)
┌────────────┼─────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ RRH 1 │ │ RRH 2 │ │ RRH 3 │ ◄── Slaves
│ [LocalOsc]│ │ [LocalOsc]│ │ [LocalOsc]│
│ Locked │ │ Locked │ │ Locked │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
▼ ▼ ▼
Frequency-Coordinated Operation
A defining advantage of the Fi-Wi architecture is the use of "Hard Synchronization" via PCIe, rather than "Soft Synchronization" via Ethernet. While Ethernet-based APs rely on IEEE 1588 PTP, they are subject to switch jitter and software stack latency. PCIe over fiber eliminates these variables.
| Feature | Fi-Wi (PCIe over Fiber) | Traditional APs (Ethernet) |
|---|---|---|
| Protocol |
PCIe PTM (Precision Time Measurement) Hardware-native, bus-level messages |
IEEE 1588 PTP Packet-based, software/firmware stack |
| Sync Accuracy |
20-50 nanoseconds Bus cycle precision + fiber margin |
100ns – 10µs Highly dependent on network load |
| Jitter Source |
Minimal Point-to-point hardware flow control |
High Switch queuing & software interrupt latency |
| CPU Overhead |
Zero Handled entirely by PCIe PHY/Controller |
Moderate to High CPU must interrupt to process sync packets |
| Primary Benefits | Accurate L4S timestamps, TSF synchronization, unified timeline for clients | Basic time sync for logging and management |
Important Note: While frequency-locked clocks provide excellent timing consistency, they do not enable RF phase control or coordinated simultaneous transmission. COTS Wi-Fi chips have independent RF synthesizers with arbitrary phase offsets that cannot be controlled externally. The value of clock synchronization lies in accurate timestamping for L4S queue management and consistent TSF counters for seamless client mobility, not in RF phase alignment.
The Concentrator's clock behavior depends on the deployment environment and regulatory requirements. There are two distinct modes of operation:
In this mode, the Concentrator is connected to an external GNSS (GPS/Galileo) receiver. The internal oscillator is disciplined to align with UTC (Coordinated Universal Time). This connects the internal timing of the Fi-Wi system to external absolute time.
In deep indoor environments (basements, bunkers) where GPS is unavailable, or cost-sensitive deployments where 6 GHz AFC is not required, the Concentrator operates in Free-Wheeling mode.
While Free-Wheeling mode is sufficient for core system operation, GPS-Disciplined (Absolute) mode becomes mandatory when the Fi-Wi system interacts with external systems that require UTC timestamps:
Standard enterprise APs utilize free-running crystal oscillators with ~20 ppm frequency error. This causes TSF counters to drift relative to each other, making seamless mobility difficult. To achieve the timing consistency required for Fi-Wi's coordinated operation, the RRH hardware architecture must be fundamentally different.
The Fi-Wi Solution: The RRH hardware uses Mobile-Class Wi-Fi Silicon (which natively supports external clock inputs) driven by a Fronthaul-Recovered Precision Clock.
┌──────────────────────────────────────────────────────────────────────────────┐
│ RRH CLOCK DISTRIBUTION ARCHITECTURE │
└──────────────────────────────────────────────────────────────────────────────┘
[ PCIe Over Fiber ]
│
│ (1) PTM Timestamps (Implicit Clock)
▼
┌─────────────────────────────┐
│ RRH FPGA / Retimer │
│ (Clock Recovery Circuit) │
└─────────────┬───────────────┘
│
│ (2) "Dirty" Recovered Clock (High Jitter)
▼
┌─────────────────────────────┐ ┌─────────────────────────────┐
│ JITTER ATTENUATOR IC │ │ WI-FI 7 SOC (Client) │
│ (e.g., Si5395 / LMK05) │ │ │
│ │ │ │
│ ┌─────────────────────┐ │ │ ┌───────────────────┐ │
│ │ Digital Servo Loop │ │ (3) Clean │ │ Internal PLL │ │
│ │ (DSPLL) │───┼───────────┼───►│ (RF Synthesizer) │ │
│ └─────────────────────┘ │ 40 MHz │ └─────────┬─────────┘ │
│ │ Reference │ │ │
└─────────────────────────────┘ └──────────────┼──────────────┘
│
▼
[ 5 GHz / 6 GHz ]
[ RF Carrier ]
(Independent phase per RRH)
Signal Flow: The RRH recovers a noisy clock from the PCIe fronthaul. A digital Jitter Attenuator cleans the signal using an internal DSP servo loop. This provides the ultra-low phase noise reference required for 4096-QAM while maintaining frequency lock to the Concentrator's timebase. Note: The Wi-Fi chip's internal PLL establishes its own RF carrier phase, which is independent across RRHs.
The clock distribution chain operates as follows:
Ext_Ref /
XO_IN pin. The chip's internal PLLs lock to this external
frequency reference, ensuring consistent TSF counter operation across
all RRHs.
Fi-Wi explicitly selects Mobile/Client Wi-Fi 7 chipsets (e.g., Qualcomm FastConnect or Broadcom BCM43xx client series) rather than traditional Enterprise AP SKUs. This choice is driven by specific architectural needs:
Ext_Ref pins that accept an external
drive signal, whereas many AP chips expect a passive crystal
resonator. This external clock capability enables consistent TSF
counter operation across all RRHs.
It is important to understand the limitations of frequency-locked clocks with COTS Wi-Fi hardware:
A rigorous control-theoretic analysis of Wi-Fi reveals a fundamental challenge: there are not one, but two distinct integrators in the transmit path. In traditional autonomous APs, these integrators are coupled in undefined ways, leading to instability (bufferbloat) and poor interaction with TCP congestion control. Fi-Wi explicitly separates these integrators, applies distinct control laws to each, and enforces a strict Time-Scale Separation to guarantee system stability.
To achieve stability, we must model and control two distinct accumulation processes:
The primary bottleneck managed by the AQM (Active Queue Management) is the Group Queue. This loop drives the end-to-end congestion control (L4S/TCP).
The queue depth Q(t) evolves based on the mismatch between the arrival rate λ(t) and the effective service rate μ(t):
dQ/dt = λ(t - τ_fwd) - μ(t)
Fi-Wi uses a PI² controller to calculate a marking probability \( p(t) \), targeting a shallow queue reference \( Q_{ref} \) (typically 200 µs). This provides a coherent signal to L4S senders:
p(t) = K_alpha * (Q(t) - Q_ref) + K_beta * ∫ (Q(t) - Q_ref) dt
Traditional congestion control relies on Active Queue Management (AQM): a queue must physically build up before the network detects congestion and signals the sender to slow down. The goal is to manage the queue size.
L4S enables a new paradigm called Active Rate Management (ARM).
Reference: Koen De Schepper, "Understanding Latency 4.0", December
2025.
Watch the explanation (19:15)
The Inner Loop manages the trade-off between PHY efficiency (large aggregates) and latency (small aggregates). In traditional APs, this integrator is effectively unbounded to maximize benchmark scores, creating a "sawtooth" latency pattern that confuses TCP.
Fi-Wi bounds this integrator via two mechanisms:
For the nested loops to remain stable, the Inner Loop must look like "constant service" to the Outer Loop. This requires the Inner Loop bandwidth (ωmac) to be significantly higher than the Outer Loop bandwidth (ωtcp):
ω_mac >> ω_tcp (typically > 20:1 ratio)
By forcing the MAC to operate at a frequency of 3–5 kHz (via 250 µs TXOPs), the aggregation noise is pushed high enough that it is naturally filtered out by the TCP loop (which operates at 10–20 Hz).
The 250 µs TXOP constraint serves a dual purpose: it maintains time-scale separation and ensures L4S receives coherent ECN feedback. Traditional Wi-Fi's massive A-MPDU aggregation creates a fundamental mismatch between Layer 2 efficiency and Layer 3 control precision.
In wide-channel deployments (160 MHz), APs build large A-MPDU aggregates containing dozens of IP packets to amortize MAC overhead. This creates three control-loop pathologies:
Fi-Wi resolves this through coordinated design:
This approach maintains the benefits of A-MPDU efficiency while preserving the feedback coherence L4S requires. The result: DualQ can sustain its ~1ms target drain time without artificial inflation from aggregate assembly delays. For detailed analysis, see Appendix I.7.
Fi-Wi uses these parameters to ensure the system remains critically damped:
| Loop | Parameter | Target Value | Rationale |
|---|---|---|---|
| Outer | Queue Reference | 200 µs | Maintains ultra-low queuing delay. |
| Outer | Update Interval | 5 ms (~1 RTT) | Matches typical control loop frequency. |
| Inner | Target TXOP | 250 µs | Ensures ωmac >> ωtcp. |
| Inner | Max Aggregate | 32 MSDUs | Limits tail latency contribution. |
In Fi-Wi, the core rule is: there is one deep queue per independent airtime resource. The physical queue lives in concentrator memory, but it represents the airtime of one RRH or a dynamic group of RRHs whose RF signals are coupled strongly enough to behave like a single cell.
If two RRHs can interfere, they cannot transmit simultaneously and therefore must share a single logical queue. If RRHs are RF-isolated, each receives its own queue. This preserves the “one bottleneck per control loop” structure required by L4S.
Service at each queue corresponds to over-the-air transmission. Any RRHs that share RF space must share a service process and therefore share a queue. RRHs that do not interfere have independent airtime and get independent queues.
Crucially, these RF groups and their queues are not static. The concentrator forms and maintains airtime domains dynamically using:
Beyond simple interference, Fi-Wi’s groupings also consider the spatial structure of the channels:
Over time, the Fi-Wi system continuously adjusts:
Groups may merge if interference appears or split if RRHs become effectively isolated (e.g., after a channel change or power adjustment, including beacon power shaping). The AQM and ECN marking logic always runs at the current group queue, so L4S always sees a single, well-defined bottleneck per cellularized domain.
Because all RRHs expose real-time CSI, queue metrics, retry statistics, airtime usage, and beacon reports into the concentrator’s shared state, Fi-Wi can form RF groups that are tuned not just for coverage but for:
Fi-Wi is not designed around a small number of big AP cells per floor. The architecture assumes something much closer to Fiber-to-the-Room (FTTR): one cell per room, with fiber or equivalent deterministic fronthaul feeding small RRHs in each room.
In higher-end deployments, each room can contain multiple RRHs (e.g., 2–4 per room) to support:
This density dramatically improves RF control. With RRHs separated by just a few meters, the concentrator sees:
Traditional AP-based architectures cannot achieve this cleanly because they lack shared state and maintain separate, isolated queues and PHY/MAC processes in each AP. Even with a central controller, they are limited to heuristic steering and static power/channel tweaks.
Fi-Wi, by contrast:
A cell-per-room architecture makes Fi-Wi fundamentally different from controller-based Wi-Fi: it behaves more like cellular small cells with centralized coordination than like a set of autonomous APs.
Fi-Wi centralizes packet memory, queueing, AQM, and TXOP scheduling inside the concentrator. Because the concentrator is the true bottleneck for all wireless transmissions, Fi-Wi can use a clean, minimal queue structure that behaves predictably under load and exposes stable delay semantics to L4S congestion controllers. This stands in contrast to traditional APs, where dozens of hidden queues (per-station, per-TID, firmware rings, retry/BA windows, PS-poll buffers, rate-control queues) produce variable and unobservable queueing delay.
This section describes Fi-Wi’s queue architecture, why WMM priority becomes largely unnecessary, and how centralized TXOP scheduling eliminates the stochastic contention that drives Wi-Fi collapse in legacy systems. The goal is simple: a minimal number of queues, explicit queue semantics, and predictable latency for all traffic classes.
Because all packets live in the concentrator’s memory until the moment they are transmitted over the air, Fi-Wi can explicitly control:
This allows Fi-Wi to do what distributed APs cannot: construct a consistent, visible bottleneck queue that L4S congestion controllers can lock onto with stable behavior.
If queue delay is capped around 500 µs, legacy WMM categories provide little additional value. For example, consider a voice stream:
Voice codec: 80 bytes every 20 ms (64 kbps) Transmit time at 1 Gbps: ~0.64 µs L4S queue target: 500 µs Voice latency budget: ~150,000 µs Queue share: 500 / 150,000 = 0.3%
If L4S keeps queueing delay under ~500 µs, then all traffic — including voice — stays far inside its latency budget. WMM’s role in combatting bufferbloat disappears when bufferbloat itself is removed.
Three real-world issues motivate a cautious design:
Voice and video often use UDP. They:
Fi-Wi can mitigate this using per-flow fair queuing inside the L4S queue, keeping UDP in check without needing a separate WMM hierarchy.
Total latency = Queue delay + Contention delay + TX delay + Retry delay
^^^^^^^^^^^^
L4S controls this
WMM historically manipulates AIFS, CW, and TXOP to reduce contention delay. Fi-Wi eliminates contention entirely using centralized TXOP scheduling, so WMM’s airtime hacks lose relevance.
Even L4S can fail under:
Hence, Fi-Wi benefits from a small amount of priority separation, at least in early deployments.
The theoretically sufficient minimal queue architecture for Fi-Wi is three queues:
┌──────────────────────────────────────────┐
│ Concentrator │
│ (Central Packet Memory • AQM • TXOP) │
└──────────────────────────────────────────┘
▲
│
┌─────────────┼──────────────────┐
│ │ │
│ │ │
┌────────┴───┐ ┌─────┴─────┐ ┌───────┴──────┐
│ Q_mgmt │ │ Q_L4S │ │ Q_classic │
│ (Strict │ │ (ECT(1), │ │ (ECT(0), │
│ priority) │ │ dual-Q) │ │ classic) │
└──────┬─────┘ └─────┬─────┘ └──────┬────────┘
│ │ │
└───────────────┼──────────────────┘
│
TXOP Scheduler
(Build AMPDU • Select RRH • 200–250µs)
│
┌─────────────────────────┼──────────────────────────┐
│ │ │
┌───▼───┐ ┌─────▼─────┐ ┌─────▼─────┐
│ RRH1 │ │ RRH2 │ │ RRH3 │
│ (PHY) │ │ (PHY) │ │ (PHY) │
└───────┘ └───────────┘ └───────────┘
The minimal Fi-Wi queue architecture contains a strict-priority management queue plus dual-queue L4S (L4S + Classic). All buffering lives in the concentrator; RRHs keep no deep queues. L4S senders see a clean single-bottleneck model, and all 802.11 management frames bypass AQM entirely for correctness.
In this design, WMM is unnecessary at the wireless bottleneck. All data traffic benefits from the same controlled queue delay, and fairness is enforced by per-flow scheduling rather than EDCA.
A more conservative deployment uses five queues per airtime domain:
┌───────────────────────────────────────────┐
│ Concentrator │
│ (Central Packet Memory • AQM • TXOP) │
└───────────────────────────────────────────┘
▲
│
┌─────────────── Five Logical Queues Per Airtime Domain ────────────────┐
│ │
┌─────┴─────┐ ┌─────────┬──────────┬──────────┬──────────┬─────────┬────────┘
│ Q_mgmt │ │ Q_L4S-hi│ Q_classic-hi│ Q_L4S-be │ Q_classic-be │
│ (priority) │ │ (Voice) │ (Legacy VoIP) │ (Bulk TCP/QUIC) │ (Legacy bulk) │
└─────┬──────┘ └──────┬──────────────┬──────────────┬──────┘
│ │ │ │
└─────────────────┼──────────────┼──────────────┘
│
TXOP Scheduler
(Build AMPDU • Select RRH • Delay Targets)
│
┌─────────────────────┼──────────────────────────┐
│ │ │
┌──▼───┐ ┌────▼────┐ ┌────▼────┐
│ RRH1 │ │ RRH2 │ │ RRH3 │
│ (PHY)│ │ (PHY) │ │ (PHY) │
└──────┘ └─────────┘ └─────────┘
The 5-queue design provides a two-tier priority system across L4S and Classic traffic. This conservative architecture offers compatibility with legacy UDP voice/video, while still keeping Fi-Wi’s centralized L4S semantics intact. Over time, deployments can collapse from 5 queues to 3 as performance data validates the simpler model.
Consider 10 simultaneous HD video calls (~20 Mbps total) plus a saturating background TCP flow:
Legacy WMM:
Fi-Wi with L4S + fair queuing:
This is roughly 1000× lower queueing latency than legacy WMM systems, and it applies to all traffic, not only traffic in a “priority” AC.
Fi-Wi can phase its queue structure over time:
Metrics to monitor include:
WMM exists to correct three historical problems in distributed Wi-Fi:
Fi-Wi removes the root causes of these behaviors:
Because of this, full WMM support at the air bottleneck is not necessary. However, Fi-Wi does support WMM semantics for:
Fi-Wi handles WMM as an admission-time mapping:
This preserves compatibility while avoiding the complexity and unpredictability of EDCA-based priority systems. Over time, Fi-Wi deployments can rely on pure L4S semantics and collapse WMM to a compatibility shim, not a required scheduling mechanism.
Fi-Wi’s centralized queue architecture enables:
Traditional Wi-Fi uses WMM to work around bufferbloat and contention. Fi-Wi removes those problems entirely through tight queue control, shared state, and central scheduling. Priority becomes a policy choice — not a crutch for an unstable MAC.
In Fi-Wi, the Carve-Out ensures the voice packet (L4S) bypasses the accumulated Classic bulk data completely. The file download continues to saturate the link, but the latency of the L4S flow is decoupled from the load of the Classic flow.
Fi-Wi’s centralized shared state across RRHs makes it natural to treat multiple radios as an active redundant set for the same STA or room. This is analogous in spirit to 802.11be’s Multi-Link Operation (MLO), where a single multi-link device (MLD) can use multiple links for reliability and capacity. In Fi-Wi, the concentrator is the coordination point leveraging shared state, and the RRHs are the distributed radios providing multiple RF paths.
In many deployments, a client STA will be audible at more than one RRH (overlapping coverage). On the uplink, Fi-Wi exploits this spatial diversity to improve reliability without requiring changes to the client.
This approach leverages the spatial diversity of distributed RRHs to mitigate shadowing and multipath fading. Because the selection logic operates on valid MAC frames (after FCS verification) rather than raw I/Q samples, this architecture maintains compatibility with standard COTS Wi-Fi silicon at the Radio Head.
On the downlink, the concentrator can treat multiple RRHs as candidate transmitters for a given STA or room:
This gives Fi-Wi:
In a multi-RRH Fi-Wi deployment, each radio head operates on the same BSSID and channel but sits in a different physical location with its own RF conditions. While Fi-Wi centralizes all queueing and scheduling decisions, every RRH must still obey the fundamental 802.11 rule: listen-before-talk (LBT).
This is where Fi-Wi diverges sharply from classical multi-AP systems. In UniFi, Ruckus, Aruba, and all controller-based Wi-Fi architectures, each AP queue is blind to the RF medium state until it attempts to transmit. The AP commits a packet to the hardware queue, and if the medium is busy, the packet waits (Head-of-Line blocking) while the AP performs backoff.
Fi-Wi inverts this. RRHs continuously report their LBT Eligibility Status (Clear/Busy) to the Concentrator via the high-speed telemetry path. RRHs report LBT eligibility status via PCIe telemetry with update intervals of 100–500 µs, well-matched to inter-TXOP scheduling decisions. While the Concentrator cannot react within a single 9µs backoff slot, it operates on the Inter-TXOP timescale (200–500 µs1).
Before posting a new DMA descriptor to an RRH, the Scheduler checks this eligibility:
This prevents Head-of-Line Blocking where a packet sits in a hardware queue on a jammed radio. When multiple RRHs report clear airtime, Fi-Wi selects among them based on link quality (CSI) and predicted airtime efficiency. Conversely, if all RRHs report medium-busy, no RRH is primed; the scheduler pauses the flow to prevent backpressure from accumulating in the RRH hardware, keeping the queue depth visible in the Concentrator where L4S can measure it.
The result is a form of Centralized Selection based on LBT Eligibility. Multi-AP systems coordinate configuration (channels, power), but they cannot coordinate transmit starts because they lack the real-time feedback loop to steer packets away from busy radios before they are queued.
1 Representative scheduling interval for mixed traffic workloads; actual TXOP durations range from tens of microseconds (small frames) to several milliseconds (large aggregates). ↩
(Shared RF / Airtime Domain)
+----------------------+ +----------------------+
| RRH-A | | RRH-B |
| (Room / Zone A) | | (Room / Zone B) |
+----------------------+ +----------------------+
| LBT: Clear | | LBT: Busy (ED high) |
| Eligible = YES | | Eligible = NO |
+----------+-----------+ +-----------+----------+
| |
| Fiber fronthaul (low latency) |
| |
v v
+-----------------------------------+
| Fi-Wi Concentrator / Scheduler |
+-----------------------------------+
| Centralized queue for building |
| L4S feedback / congestion state |
| |
| Decision: Post Descriptor to A |
| (RRH-B flagged as jammed/ineligible|
| prevents HoL blocking) |
+----------------+------------------+
|
| Downlink frames / aggregates
v
+--------------+
| Client(s) |
+--------------+
Time →
------------------------------------------------------------------------------------------------->
RRH-A (Room A): [ Sense medium ] [ Idle ] [ Clear ] [ Transmit TXOP ] [ Idle ... ]
|<-- DIFS --->| |<---- contention window (few slots) ---->|
RRH-B (Room B): [ Sense medium ] [ ED high: medium busy ] [ Backoff ... ]
|<---- busy ---->|
RRH LBT → Scheduler: A: "Clear" B: "Busy"
Scheduler View: [ Receive LBT states from A, B ]
[ Mark A = eligible, B = ineligible ]
[ Dequeue next packets from central queue ]
[ Post descriptor to RRH-A only ]
Downlink Action: RRH-A receives descriptor, enters backoff, wins, transmits.
RRH-B remains silent (no descriptor posted).
Effect: • No packet trapped in RRH-B's buffer
• No exponential backoff storm
• Deterministic selection of the RRH with clear airtime
802.11be MLO allows a multi-link device (AP/STA) to use multiple links (e.g., 2.4G, 5G, 6G bands or channels) under a single MAC entity. Features include:
Fi-Wi provides a similar effect at the building scale, but with important differences:
Because the RRHs are spatially distributed around rooms and hallways, Fi-Wi gains advantages that co-located antennas cannot provide:
These advantages come from intelligent packet routing and dynamic RRH selection, not from RF phase coordination or simultaneous beamforming across RRHs.
Fi-Wi strictly adheres to local regulatory compliance. The Concentrator manages the queue and the schedule, but the RRH manages the compliance.
When the Scheduler assigns a TXOP to an RRH, it posts a descriptor. The RRH hardware then performs standard 802.11 EDCA:
The Architectural Difference:
In MLO or Mesh: If an AP commits a packet to a radio and that radio hits congestion, the packet is trapped in the local buffer. The backoff might take 50ms. During this time, the AP's other radios (or other APs in the mesh) might be idle, but they cannot help because the packet is already "owned" by the busy MAC.
In Fi-Wi: The packet remains in the Concentrator's central memory until the last possible moment (see Appendix F). If the Concentrator sees an RRH entering deep backoff (via real-time telemetry) or reporting "Busy," it stops posting new descriptors to that RRH and steers subsequent traffic to a free RRH. The backoff engine remains local (compliance), but the queue feeding it is steered globally (performance).
This allows Fi-Wi to scale airtime domains across an entire building while preventing the multi-node contention collapse that plagues traditional Wi-Fi networks.
Wi-Fi 7 MLO: per-radio queues and MAC logic Fi-Wi: one centralized queue per airtime-domain
================================================ ===============================================
Airtime-domain Airtime-domain
-------------- --------------
+-------------+ +-------------+ +-------------------------+
| Radio 1 | | Radio 2 | | Fi-Wi Concentrator |
| MAC engine | | MAC engine | | (per airtime-domain) |
| Backoff | | Backoff | +-------------------------+
| DMA queues | | DMA queues | | Centralized queue |
+------+------+ +------+------+ | AQM / L4S feedback |
| | | Scheduler |
| | +-----------+-------------+
v v |
Packet trapped Packet trapped |
in local queue in local queue |
during backoff during backoff v
+--------+-------+ +--------+-------+
| RRH A | | RRH B |
| RF front-end | | RF front-end |
| LBT + backoff | | LBT + backoff |
+--------+-------+ +--------+-------+
^ ^
| |
Scheduler posts descriptor only to
the RRH that is clear and eligible.
To keep L4S happy, Fi-Wi needs to preserve a single bottleneck queue per flow even while using multiple RRHs:
In other words:
Traditional Wi-Fi deployments suffer from two fundamental problems in high-density environments: (1) clients are statically associated to a single AP based on initial connection, leading to suboptimal performance as they move, and (2) autonomous APs compete for airtime through CSMA/CA contention, creating interference. Fi-Wi inverts this paradigm through Dynamic Point Selection—continuously choosing the optimal RRH per packet—and Intelligent Frequency Reuse—leveraging spatial isolation to maximize capacity.
Unlike traditional Wi-Fi where clients are physically and logically tied to a single Access Point (AP), Fi-Wi treats the entire building as a single Virtual Cell. The Concentrator maintains real-time Channel State Information (CSI) from all RRHs and dynamically selects the optimal transmission point for each individual packet.
To understand the magnitude of this shift, we must compare the standard "Fast BSS Transition" (802.11r) with the Fi-Wi approach. In standard Wi-Fi, mobility is a negotiation. In Fi-Wi, it is an execution.
| Step | Standard Wi-Fi (802.11r / Fast Roaming) | Fi-Wi (Dynamic Point Selection) |
|---|---|---|
| 1. Trigger | Client detects low RSSI and decides to scan. | Concentrator detects better path via Uplink SNR. |
| 2. Action | Client tunes radio off-channel to scan for beacons (Latency spike: 50–100ms). | Zero Action. Client stays on channel. |
| 3. Handshake | Client sends Auth + Re-Assoc frames. AP validates keys. | None. No Over-the-Air frames. |
| 4. Switch | AP 1 tears down keys; AP 2 installs keys. | Concentrator updates the DL_RRH_ID pointer in memory. |
| Total Time | ~50ms – 150ms (Best case) | < 1ms (PCIe Write) |
While 802.11r is sufficient for buffered video (Netflix), it typically breaks real-time applications like Voice over Wi-Fi (VoWiFi) and VR/XR, where a 50ms gap causes audio dropouts or visual artifacts. Fi-Wi's sub-millisecond switching ensures true continuity.
Consider "Alice" on a VR headset walking down a hallway:
In traditional Wi-Fi, neighboring APs on the same channel create co-channel interference. The standard solution is to assign different channels (e.g., AP-A uses Channel 36, AP-B uses Channel 48), but this wastes spectrum. Fi-Wi enables intelligent frequency reuse—using the same channel across multiple RRHs when spatial conditions allow.
Frequency reuse is viable when clients are in spatially separated locations with significant isolation (typically >25-30 dB attenuation due to walls, floors, or distance).
Example: Adjacent Rooms
The Fi-Wi Decision:
The key advantage over static channel planning is real-time adaptation:
| Requirement | Fi-Wi (C-RAN) | Autonomous APs |
|---|---|---|
| Global CSI Visibility | Complete: Concentrator sees CSI from all RRHs to all clients in real-time | Fragmented: Each AP only knows its own channel. Must exchange info over backhaul (high latency) |
| Decision Latency | Microseconds: Concentrator makes decisions in software at µs granularity | Milliseconds to seconds: APs coordinate via slow management protocols |
| Adaptation Speed | Per-packet: Can switch RRH or channel based on every CSI update | Minutes: Channel changes require beacon updates, client reassociation |
| Client Disruption | None: Decisions are transparent to clients | High: Channel changes or AP reassignment cause connectivity interruptions |
The complexity of dynamic point selection and frequency reuse is hidden from the L4S congestion control loop. Traffic still lives in per-airtime-domain group queues. When the Concentrator enables frequency reuse or optimizes RRH selection, it simply affects the effective service rate μ(t) of the queue.
The PI² controller in the outer loop (see Section 5) sees the queue draining faster and naturally reduces ECN marking. This allows L4S senders (TCP Prague) to ramp up their congestion windows to fill the expanded capacity. The system automatically discovers and exploits available spatial capacity without requiring changes to congestion control algorithms or application awareness.
A common critique of centralized wireless architectures is the "autonomous client problem": while the infrastructure can be coordinated, the stations (STAs) are independent entities that contend for the medium using their own logic.
Fi-Wi addresses this by enforcing a Control Hierarchy that governs client behavior from the physical layer up to the transport layer. Instead of passively hoping for "good client behavior," Fi-Wi uses four distinct mechanisms to throttle, steer, or schedule station media access.
Level 1: Deterministic (Hard)
[ 802.11ax Trigger Frames ] ──▶ STA must wait for Schedule
(Zero contention)
Level 2: Transport (Adaptive)
[ L4S / ECN Marking ] ────────▶ OS Kernel throttles pacing
(Reduces MAC load before enqueue)
Level 3: RF Physics (Steering)
[ Beacon Power Shaping ] ─────▶ STA firmware seeks new cell
(Moves demand to different domain)
Level 4: Statistical (Soft)
[ WMM / AIFS Parameters ] ────▶ STA adjusts backoff aggression
(Statistical deprioritization)
For modern clients (Wi-Fi 6/7), Fi-Wi removes autonomy entirely for uplink traffic. The Concentrator generates Trigger Frames via the RRH.
For the growing ecosystem of L4S-capable clients (iOS, macOS, Linux, Windows), control is applied at the Operating System kernel.
CE (Congestion Experienced) codepoint in the IP header of
downlink packets based on the centralized Group Queue depth.
Fi-Wi manipulates the physical environment to restrict which RRHs a client perceives as viable, effectively "shoving" media access demand to specific airtime domains.
As a defense-in-depth measure for legacy clients, Fi-Wi advertises tuned WMM EDCA parameters.
To maintain technical accuracy, it is important to clarify what Fi-Wi's dynamic point selection does not provide:
These capabilities would require either:
Fi-Wi's architecture deliberately focuses on capabilities achievable with COTS Wi-Fi chips, providing 2-3x capacity improvement through intelligent management rather than pursuing 4-6x gains that would require custom silicon development.
Based on the capabilities described above, Fi-Wi provides the following performance improvements over traditional autonomous AP deployments:
These gains are achieved through centralized intelligence and microsecond-latency fronthaul, not through RF phase control or coordinated transmission. The architecture remains fully compliant with unlicensed spectrum regulations and works with commodity Wi-Fi chipsets.
Fi-Wi transforms the problem of wireless density by treating it as a routing and scheduling problem rather than an RF coordination problem. By centralizing packet memory and MAC scheduling, Fi-Wi converts adjacent radios from interferers into dynamically selected access points, allowing the network to scale capacity through intelligent management rather than collapsing under interference.
The key insight is that most Wi-Fi performance problems stem from poor decisions (wrong AP, wrong channel, wrong timing) rather than fundamental RF limitations. Fi-Wi solves this by providing the Concentrator with complete visibility and control, enabling microsecond-granularity optimization that autonomous APs cannot match.
Modern enterprise Wi-Fi deployments use centralized controllers (Cisco WLC, Aruba Mobility Controller, Ubiquiti UniFi, Ruckus SmartZone, etc.) to manage multiple APs. These controllers coordinate the control plane: channel assignment, transmit power, client association hints, roaming policies, and security. However, these remain loosely-coupled systems where the data plane — queueing, MAC scheduling, aggregation, and packet memory — remains distributed inside individual APs.
A traditional AP is not just “running EDCA.” It is running EDCA after juggling dozens or hundreds of logical MAC queues and state machines:
With N stations, an AP can easily have on the order of N × (4–8) logical queues behind a single RF channel. Every AP in the same RF domain runs this large, isolated, queue-filled state machine independently. No AP has a global view; controllers see only coarse statistics.
The result:
Fi-Wi is fundamentally different: it centralizes both control plane and data plane with shared state across all RRHs. The concentrator does not just configure RRHs; it directly manages their queues, schedules their TXOPs, maintains unified CSI and airtime state, and applies coordinated ECN marking for each airtime domain. This architectural difference — not just improved control-plane coordination — is what enables Fi-Wi’s latency, L4S, and spatial multiplexing advantages.
┌──────────────────────────── Traditional Distributed AP ───────────────────────────┐ │ │ │ Many MAC queues hidden inside each AP: │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ STA 1 TID │ │ STA 2 TID │ │ STA N TID │ ... (N stations × 4–8 TIDs)│ │ │ Queues │ │ Queues │ │ Queues │ │ │ └─────┬───────┘ └─────┬───────┘ └─────┬───────┘ │ │ │ │ │ │ │ ┌──────▼────────────────▼────────────────▼──────────┐ │ │ │ Firmware Queues (Aggregation, Reorder, BAR/BA) │ │ │ └───────────┬───────────────────────────────────────┘ │ │ │ │ │ ┌───────────▼──────────────┐ │ │ │ Hardware MAC Ring Buffers│ (TX/RX DMA) │ │ └───────────┬──────────────┘ │ │ │ │ │ ┌───────────▼──────────────┐ │ │ │ EDCA / CSMA-CA Contention│ (Per-AP, no coordination) │ │ └───────────┬──────────────┘ │ │ │ │ │ Long, multi-ms TXOP bursts, inconsistent ECN, early collapse │ │ │ └───────────────────────────────────────────────────────────────────────────────────┘
See also: Section 2.1 — Why L4S + Legacy Wi-Fi Struggle, Appendix A — 802.11 Backoff & Collapse Dynamics.
The following subsections detail specific benefits of Fi-Wi’s cellularized, tightly-coupled architecture compared to controller-managed, loosely-coupled AP systems.
Traditional APs:
Each AP builds its own local queues. Under load, large aggregates, retries, and hidden buffering produce multi-millisecond queueing and service delays. Tail latency is largely uncontrolled, and varies across APs sharing the same channel.
Fi-Wi (cellularized Wi-Fi, cell-per-room):
Traditional APs:
L4S flows traverse multiple hidden queues: wired bottlenecks, AP-local queues, firmware queues, and EDCA contention. ECN marking (if it exists at all) is inconsistent and not tied to a single bottleneck. Collapse produces noisy, bursty marking or loss, and the L4S control loop becomes oscillatory or falls back toward classic congestion behavior, especially in the tails that matter to users.
Fi-Wi:
Traditional APs:
Aggregation improves PHY efficiency but hides individual packet timing from the congestion controller. The controller does not know which MSDUs were grouped into a TXOP, what the queue state was when the TXOP started, or how long each device has been waiting.
Fi-Wi:
This combination yields high PHY efficiency and transport-layer visibility into congestion, instead of having to choose one or the other.
Controller-managed loosely-coupled APs:
The controller can adjust channels, power, and send steering hints (e.g., 802.11v), but it cannot see or control:
As a result, these systems rely on heuristic, reactive policies: channel reassignment after interference is observed, power adjustments based on neighbor reports, and client steering using RSSI or airtime snapshots. These help, but they operate on coarse time scales (seconds to minutes) and cannot fix the fundamental data-plane issues of distributed queues, MAC contention, and tail latency under load.
Fi-Wi cellularized architecture:
The concentrator maintains true shared state across all RRHs in the building:
Because RRHs are distributed in space (often 2–4 per room in high-density deployments), Fi-Wi can leverage spatial separation for intelligent frequency reuse. The concentrator sees CSI from all RRHs and can make microsecond-granularity decisions about which RRH should transmit each packet — all while preserving the "single bottleneck queue per airtime domain" discipline required for stable L4S behavior.
┌─────────────────────────── Fi-Wi Cellularized Architecture ────────────────────────────┐ │ │ │ One deep queue per airtime domain Shared CSI + µs timestamps │ │ │ │ ┌───────────────────────────────────────────┐ │ │ │ Centralized Airtime-Domain Queue (ECN AQM)│◄──────────┐ │ │ └───────────────────┬──────────────────────┘ │ │ │ │ │ │ │ ┌──────────────────────────▼──────────────────────────┐ │ │ │ │ Concentrator Scheduler (L4S, TXOP, RF Grouping) │◄───────┘ │ │ │ Dynamic Point Selection per Packet │ │ │ └───────────────┬─────────────────────────┬───────────┘ │ │ │ │ │ │ PCIe/Fiber │ │ PCIe/Fiber │ │ │ │ │ │ ┌───────────────▼─────────────┐ ┌────────▼──────────────┐ ... │ │ │ RRH 1 (Thin MAC/PHY) │ │ RRH 2 (Thin MAC/PHY)│ │ │ └───────────────┬─────────────┘ └────────┬──────────────┘ │ │ │ │ │ │ Selected RRH transmits; others silent in this TXOP │ │ │ └────────────────────────────────────────────────────────────────────────────────────────┘
See also: Section 4 — Key Fi-Wi Mechanisms, Section 5 — Control Architecture, Section 9 — Dynamic Point Selection.
The table below summarizes the architectural differences between controller-managed, loosely-coupled APs and Fi-Wi's cellularized, tightly-coupled architecture:
| Capability | Controller-Managed Loosely-Coupled APs | Fi-Wi Cellularized Tightly-Coupled |
|---|---|---|
| Control Plane | ||
| Channel assignment | ✓ Centralized | ✓ Centralized |
| Transmit power control | ✓ Centralized | ✓ Centralized + dynamic beacon shaping |
| Client steering hints | ✓ Centralized (802.11v/k) | ✓ Centralized |
| Data Plane | ||
| Packet queues | ✗ Distributed per-AP; many hidden per-STA/per-TID/firmware queues | ✓ Exactly one deep queue per airtime domain in the concentrator |
| MAC scheduling & aggregation | ✗ Autonomous per-AP; long TXOPs under load | ✓ Coordinated across RRH groups; TXOP length explicitly bounded |
| Timestamp synchronization | ✗ Not available at packet level | ✓ µs-accurate (PTM/PTP) shared across RRHs |
| Shared CSI state | ✗ Per-AP only; summarized to controller | ✓ Building-wide CSI aggregation at the concentrator |
| Queue visibility & AQM | ✗ Hidden in each AP; no global AQM | ✓ Fully visible per domain; explicit L4S/AQM on the true bottleneck |
| L4S/ECN marking point | ✗ Inconsistent or absent; multiple uncontrolled bottlenecks | ✓ Single, well-defined marking point per airtime domain |
| Dynamic point selection | ✗ Clients statically associated to one AP | ✓ Per-packet RRH selection based on real-time CSI (Section 9) |
| Selection diversity | ✗ Single AP receives uplink | ✓ Multiple RRHs receive; best copy selected (Section 9) |
| Intelligent frequency reuse | ✗ Static channel plan | ✓ Dynamic adaptation based on spatial isolation (Section 9) |
| Per-packet steering between radios | ✗ Not available | ✓ Active redundancy and fast failover (Section 8) |
| Dynamic RF grouping | ✗ Static AP boundaries | ✓ Adaptive airtime domains based on CSI and load (Section 6) |
Controller-managed loosely-coupled APs:
Fi-Wi cellularized architecture:
The economic viability of a "Cell-Per-Room" architecture hinges on the Remote Radio Head (RRH) being fundamentally simpler, cooler, and cheaper than a traditional Enterprise Access Point. By offloading complex logic to the Concentrator (Section 13) and precision timing to the Fronthaul (Section 4.7), the RRH becomes a lean physical device.
Fi-Wi explicitly selects Mobile/Client Wi-Fi 7 chipsets (e.g., Qualcomm FastConnect or Broadcom BCM43xx client series) rather than traditional Enterprise AP/Networking SKUs. While Section 4.7 detailed how this enables external clocking, this choice is equally critical for the physical envelope:
We set a hard budget of 3.5–4 W total per RRH, enabling Power over Ethernet (PoE) Class 1 or 2 operation, or simple remote powering over hybrid fiber/copper cables.
A sub-4W envelope fundamentally changes the industrial design possibilities for the RRH:
Fi-Wi relies on a "Split Thermal" architecture. We deliberately shift the power density from the edge (the ceiling) to the core (the wiring closet).
A central hardware design choice is to make the RRH look like a PCIe endpoint to the Fi-Wi concentrator. This leverages the fact that:
Benefits of this choice:
We start with PCIe Gen3, one lane (x1), carried over fiber via a retimer + optical interface. Higher generations or widths (Gen4, x2/x4) are possible later but not required for the initial Fi-Wi performance targets.
PCIe Gen3 provides:
After protocol overhead (TLP headers, DLLPs, flow control), the sustained payload throughput for Gen3 x1 is in the rough range of 6–7 Gb/s for large transfers. This is more than sufficient for:
If a future RRH design must exceed this, the same architecture scales to:
For our initial Fi-Wi deployment assumptions, Gen3 x1 over fiber is a sensible and sufficient starting point.
PCIe Gen3 latency has several components:
Order-of-magnitude:
Compared to:
the PCIe-over-fiber latency is effectively negligible. It comfortably fits within the microsecond-level time base used for:
t_ingress_us timestamps in FiWiMeta.The PCIe model fits naturally with the Fi-Wi queueing and metadata scheme. Each RRH behaves like a PCIe endpoint with:
The FiWiMeta header lives in host memory adjacent to packet
payloads and is referenced by these descriptors.
Downlink flow:
FiWiMeta (including
t_ingress_us and queue snapshot).
Uplink flow:
In both directions, the PCIe fronthaul:
FiWiMeta assumptions of the control-plane design.
A critical operational requirement for Fi-Wi is the ability to service, replace, or add RRHs without bringing down the entire building's wireless network. PCIe provides native support for this through hot-plug capability, which is standard in enterprise server platforms and can be leveraged for Fi-Wi deployments.
PCIe hot-plug allows physical insertion and removal of endpoint devices (RRHs) while the system is running:
When a new RRH is connected or powered on:
Time from physical insertion to active traffic forwarding: typically 1–5 seconds, depending on link training, driver initialization, and RF group discovery.
When an RRH is removed (planned maintenance, failure, or surprise disconnection):
Impact on active connections: minimal to none for STAs served by multi-RRH domains. Traffic seamlessly fails over to remaining RRHs within the same RF group. For isolated single-RRH cells, removal causes brief disconnection until STAs reassociate with neighboring cells.
Hot-plug capability provides critical operational benefits:
To fully support hot-swap in production deployments:
Traditional distributed APs handle failures differently:
Fi-Wi's PCIe hot-plug, combined with multi-RRH airtime domains and centralized queues, enables sub-second failover with minimal packet loss—a qualitative improvement over traditional Wi-Fi high-availability approaches.
Hot-swap events interact cleanly with Fi-Wi's L4S and queueing architecture:
This separation—queues and control in the concentrator, timing-critical MAC in hot-swappable RRHs—is precisely what enables graceful hardware lifecycle management while maintaining the control-theoretic cleanliness that L4S requires (Appendix A).
To understand why Fi-Wi achieves deterministic latency where traditional Wi-Fi fails, we must look beyond the protocol and into the physical architecture of the devices. The feasibility of the "Cut-Through" RRH design relies on the upstream link being non-blocking. Fi-Wi achieves this by replacing the internal switching fabric of legacy APs with the massive PCIe lane overprovisioning of a workstation-class Concentrator.
| Component | Traditional AP (The Appliance) | Fi-Wi RRH (The Peripheral) |
|---|---|---|
| Core Silicon | Complex SoC (Quad-core CPU, NPU, Switch) | Thin PHY/MAC + PCIe Retimer |
| Data Path | Store-and-Forward (Switch → CPU → DMA) | Cut-Through (Fiber → PCIe → Air) |
| Queues | 1000s of opaque hardware queues | Zero deep queues (FIFO only) |
| Decision Making | Autonomous (Local Scheduler) | None (Slave to Concentrator) |
A traditional Enterprise Access Point is functionally a "Router-on-a-Stick." It forces high-speed wireless traffic through a series of internal serialization bottlenecks before the software ever sees the packet.
Architectural Flaws in Legacy APs:
Fi-Wi eliminates the internal switch, the GMII link, and the autonomous CPU. By utilizing high-end workstation silicon (e.g., AMD Threadripper Pro or Intel Xeon W-3400 series), the Concentrator provides 92 to 128 native PCIe lanes directly from a CPU with 24 to 96 high-performance cores.
The 92+ lanes of PCIe eliminate the need for an internal ethernet switch anywhere in the datapath.
By mapping each RRH (or small groups of RRHs) to dedicated root ports on the CPU, Fi-Wi achieves a Non-Blocking Architecture:
This guarantees that the host DRAM behaves like Deterministic Ultra-Low Latency Memory rather than a shared network resource. This stability is the physical foundation that allows the software-defined queues (Section 14) to operate with microsecond precision.
Just as Fi-Wi removes blocking via massive PCIe lane availability, the CyBus ASIC in the Cisco 7500 (1990s) solved a similar bottleneck in routing.
Fi-Wi applies this same "Non-Blocking" philosophy to the wireless stack, utilizing 92+ lanes of PCIe to ensure that RRH memory access is never gated by a shared internal switch or software mutex.
Traditional Wi-Fi APs use hardware DMA (Direct Memory Access) rings to meet strict 802.11 MAC timing requirements—SIFS and DIFS deadlines measured in microseconds. While this solves the timing problem, it creates a cascade of architectural constraints that Fi-Wi explicitly avoids.
Hardware queues are expensive to implement in silicon. Each queue requires dedicated SRAM for descriptor storage, control logic for pointer management and overflow handling, and power even when idle. Current chip design limits traditional APs to hardware queues at L2 or MAC—typically the four WMM access categories (AC_VO, AC_VI, AC_BE, AC_BK) per radio * N stations.
While sufficient for basic priority handling, this fundamental constraint prevents the sophisticated per-flow scheduling that modern high-density networks require:
An equally significant problem is that once packets are enqueued to hardware DMA rings, the CPU cannot access them without causing race conditions. This "ownership transfer" creates fundamental limitations:
This prevents:
Because hardware queues are limited and packets become inaccessible after DMA, traditional AP vendors must add compensating hardware functionality to address these fundamental architectural limitations:
| Fundamental Limitation | Hardware Workaround Required | Complexity Added |
|---|---|---|
| Only 4-8 queues → no per-flow fairness | Airtime fairness tracking engine | Significant additional logic |
| Only 4-8 queues → no per-STA queuing | MU-MIMO grouping and coordination | Complex scheduling algorithms |
| Can't inspect after enqueue | Hardware deep packet inspection engine | Pattern matching, state tracking |
| Can't mark ECN in real-time | Hardware ECN marker with threshold logic | Queue monitoring, marking logic |
| Can't reclassify flows dynamically | Flow classification accelerator (TCAM) | Fixed rules; high-priority only; cannot update easily |
This compensating hardware represents substantial additional silicon area, design complexity, and verification effort. More critically, hardware-based solutions are fundamentally limited to fixed thresholds and simple policies that were designed into the chip. They cannot implement sophisticated algorithms like CoDel, PIE, or adaptive per-flow policies that require complex state and frequent updates.
Fi-Wi escapes these constraints through architectural separation:
RRH silicon implements only timing-critical functions (MAC/PHY, synchronization) with zero hardware queues. Packets arrive from the concentrator milliseconds before transmission, stay in simple descriptor rings briefly, then transmit. No autonomous queuing or scheduling logic.
All queues live in concentrator DRAM. Because the concentrator operates at TXOP granularity (~600 µs) rather than SIFS granularity (16 µs), it has time for software scheduling. Queue structures are simple data structures in memory— vastly cheaper than dedicated silicon:
The critical difference: packets remain in concentrator DRAM (software-accessible) until milliseconds before transmission. The scheduler can:
now() - pkt->enqueue_time
RRH only owns packets for ~1 ms while transmitting a TXOP—too brief to constrain the system.
| Aspect | Traditional AP | Fi-Wi |
|---|---|---|
| Queue count | N stations * 4-8 (at MAC or L2 level) | 1000+ (dynamically allocated, quintuple level) |
| Queue implementation | Dedicated silicon (expensive) | Software data structures (negligible cost) |
| Compensating logic | Substantial silicon for workarounds | None needed |
| Per-flow fairness | Impossible (insufficient queues) | Standard capability |
| Sophisticated AQM | Simple thresholds only (hardware fixed) | Any algorithm (CoDel, PIE, ML-based) |
| Policy updates | Requires new silicon design | Software configuration or code update |
| Operational visibility | Aggregate counters only | Full per-flow statistics and queue contents |
| Algorithm experimentation | Impossible in production | A/B testing, gradual rollout possible |
Beyond the direct silicon cost advantages, Fi-Wi gains strategic advantages that compound over time:
Fi-Wi's approach follows a clear design principle:
This separation is not arbitrary. It's driven by fundamental constraints: hardware is expensive, inflexible, and opaque; software is cheap, updatable, and inspectable. By placing intelligence in software and only timing-critical functions in hardware, Fi-Wi achieves both the performance of hardware-accelerated systems and the flexibility of software-defined networking—advantages that traditional distributed-AP architectures cannot replicate due to their need for autonomous per-AP decision-making at microsecond timescales.
The Fi-Wi architecture's centralized observability enables machine learning to optimize MCS transition dynamics on a per-site basis. Unlike autonomous APs that operate on partial, local state, the Concentrator observes the complete state-transition graph for all RRHs under a single clock. This section describes how Fi-Wi combines physics-based models with adaptive learning to optimize performance.
The MCS state graph from Section 2.7 can be formalized as a probability current network, where each node represents a PHY configuration state (MCS index, spatial stream count) and edges represent transitions between states. The system's behavior follows probability flow dynamics:
What you're seeing: The vector field (arrows) shows the "flow" of PPDUs through the MCS/Spatial Stream space—the "river" of probability current that drives system behavior.
Autonomous AP (Left): Turbulent flow with chaotic arrow directions, sometimes pointing backward when collisions occur. Multiple shallow potential wells create competing forces. This represents High Entropy—the system doesn't know which way is optimal.
Centralized Concentrator (Right): Laminar flow with smooth, coherent streamlines pointing toward the optimum. Steeper gradients and deeper potential wells create strong convergence. This represents Low Entropy (Determinism)—the system has clear direction toward the optimal state.
In Phase 1, the Central Concentrator uses standard MAC-level
timing to prevent APs from transmitting simultaneously on the same
frequency.
Result: This successfully eliminates the "Red"
(collisions) seen in the Autonomous model. However, because the
Radio Heads (RRHs) are not phase-aligned, they cannot perform
Joint Transmission. The channel rank is limited to the physical
antennas of a single RRH (Rank 4). Throughput hits a "Glass
Ceiling."
In Phase 2, we introduce an FPGA to achieve sub-nanosecond
synchronization between RRHs. This allows multiple RRHs to act as
a single, distributed antenna array.
Result: This unlocks Rank Expansion. The
system can resolve 16+ spatial streams (Eigenvectors)
simultaneously. The "Glass Ceiling" is removed, and throughput
scales linearly with the number of RRHs deployed.
Machine learning in Fi-Wi optimizes the transition rate matrix W based on telemetry that is only observable in a centralized architecture. For each potential transition from state i (MCSi, SSi) to state j (MCSj, SSj), the learned rate depends on:
The learned transition rate function takes the form:
This learned function answers: "Given the current state and observed conditions, what is the optimal next MCS/SS configuration to meet the L4S latency target while maximizing achievable throughput?"
The ML engine operates on the control plane timescale with adaptive update rates: milliseconds for sudden events (interference spike detection requiring rapid response), seconds for typical rate adaptation (matching the timescales demonstrated by minstrel/minstrel_ht schedulers), and minutes for long-term pattern learning (daily traffic patterns, where slower updates are sufficient). This decouples the computational cost of learning from the latency constraints of packet transmission. The scheduler does not run neural network inference per packet—it uses a pre-computed policy matrix updated at rates appropriate to the dynamics being observed.
Fi-Wi uses physics-informed machine learning that combines Shannon capacity theory with learned corrections. This hybrid approach provides explainability, sample efficiency, and principled generalization.
The transition rate decomposes into two components:
Wphysics: The physics baseline uses Shannon capacity to establish theoretical bounds. For each MCS index, the required SNR is known from 802.11 specifications (e.g., MCS 11 requires ~30 dB). The base transition rate is the probability that current SNR exceeds the threshold given measured CSI.
Wlearned: The learned correction factor captures deviations from ideal conditions on a per-station basis, as different spatial stream capabilities and local RF environments require station-specific adaptation:
This approach uses residual learning: the physics model Wphysics provides the coarse steering (the "prior"), while the ML model learns the residual error Δ specific to the site. This guarantees the system never performs worse than a standard physics-based model, even before site-specific training converges. The ML correction is additive (or multiplicative) to a known-good baseline.
This decomposition provides three advantages:
The Concentrator's complete state visibility provides labeled training examples that are impossible to obtain in distributed AP systems. Each scheduling decision creates a training tuple:
Over time, the Concentrator accumulates thousands of these labeled examples across varying conditions. The ML model learns patterns such as:
This supervised learning is only possible with centralized observability. As detailed in Appendix H, autonomous APs lack:
It's worth noting that supervised learning doesn't require perfect ground truth labels to be effective—even relative quality assessments ("better" vs "worse") can drive learning. However, Fi-Wi's complete observability provides significantly richer training signals: precise measurements of queue impact, throughput changes, and latency effects that enable more efficient learning compared to the partial observability available to autonomous systems.
Fi-Wi's ML strategy uses transfer learning to balance generalization across sites with site-specific optimization:
Base Model (Cross-Site Training):
A foundational model is trained across multiple deployment sites to learn universal patterns:
Site-Specific Adaptation:
When deployed to a new site, the base model is augmented with learned corrections:
Continuous Adaptation:
The system continues to adapt using online learning with safety constraints:
Fi-Wi's ML capability creates a feedback loop that improves system performance over time:
This loop is unique to centralized architectures. Autonomous APs cannot generate ground truth labels without queue observability. Coordinated AP systems (where APs share summaries via a controller) see effects (latency, ECN) but not causes (queue growth, retry timing, aggregation depth) due to high inference distance.
Fi-Wi's centralized state graph provides the causal observability that machine learning requires. The probability current framework gives this learning a rigorous mathematical foundation: we are learning the transition rate matrix of a physical system governed by conservation laws.
Machine learning requires complete, structured training examples where actions, states, and outcomes are observable under consistent measurement. Fi-Wi's centralized architecture provides this by design: all state transitions occur under a single clock, all queue dynamics are visible, and all RF outcomes are measurable. This makes the MCS probability current learnable—something that is architecturally impossible in distributed, autonomous systems.
The presence of multiple concurrent Radio Heads (RRHs) serves as the primary multiplier for the Fi-Wi machine learning capability. It transforms the learning problem from optimizing a single isolated link into optimizing a spatially coupled network. While a traditional AP optimizes a local objective function (its own throughput), the Fi-Wi Concentrator utilizes concurrent RRHs to construct a global view of the RF environment.
This multi-RRH architecture impacts the learning model in three critical ways:
In traditional systems, an AP is blind to the interference seen by its neighbors. In Fi-Wi, the Concentrator aggregates real-time telemetry from all RRHs simultaneously.
This creates a Global RF State Matrix composed of:
This state matrix is sparse, time-aliased, and derived from standards-compliant telemetry rather than continuous per-packet baseband capture.
The model learns not just that "Client A has a weak signal," but specifically that "Client A is weak on RRH 1, strong on RRH 2, and creates -80 dBm interference on RRH 3." This global observability enables the prediction of building-wide interference patterns invisible to single-cell learners.
Because Fi-Wi treats multiple RRHs as an active redundant set, the ML engine has a broader action space than a standard rate-control algorithm. It learns not only how to transmit (MCS and scheduling decisions) but which RRHs are eligible transmitters for a given packet.
Note: This capability requires the hardware-synchronized FPGA architecture (Phase 2).
With sub-nanosecond synchronization, the ML engine will be able to resolve the true distributed Eigenstructure of the environment—the "shape" of available RF paths across distributed radios. This allows for Rank Expansion, where the system resolves more spatial streams (Eigenvectors) than a single physical AP could support, scaling capacity approximately with the number of RRHs, subject to channel rank and geometry.
To ensure the physics-informed model converges accurately, Fi-Wi employs a specific operational strategy: Zero-Occupancy Sounding.
As described in Section 15.5, the site-specific transfer function is composed of static building characteristics (Hstatic) and dynamic temporal variations (Δtemporal). To disentangle these variables, the system schedules automated channel sounding during hours of minimum occupancy.
In metrology, "tare" refers to zeroing a scale by removing known weights to isolate what you want to measure. Similarly, Fi-Wi "tares" the RF environment by measuring when human activity (the known variable) is absent.
Hmeasured(empty) ≈ Hstatic + Δbuilding
By sounding when the building is empty, the system effectively removes the noise of human movement and dynamic scatterers. This allows the Concentrator to:
This establishes a stable baseline "Zero State" for the learning model, ensuring that subsequent online learning is optimizing for dynamic changes rather than relearning the static environment. This separation dramatically improves offline RL dataset conditioning by preventing the model from relearning static structure while adapting to temporal dynamics.
While the primary learning mode is offline (using historical data), the centralized Concentrator architecture enables a hybrid approach: opportunistic, bounded model validation during predicted idle periods.
Because the Concentrator has global visibility of queue states across all RRHs in an Airtime Domain, it can predict when the RF channel will be underutilized—a capability fundamentally unavailable to autonomous APs that see only their local queues.
During high-confidence idle predictions, the system can perform controlled validation and calibration—not arbitrary exploration:
These activities refine the offline model without introducing risk to production traffic.
Validation is strictly bounded to prevent interference with real traffic:
This hybrid approach provides the safety of offline learning with the adaptability of continuous refinement, exploiting natural traffic lulls that autonomous APs cannot collectively identify.
Machine learning for MCS optimization is fundamentally enabled by Fi-Wi's centralized architecture and impossible in distributed AP systems:
| Requirement for ML | Autonomous AP | Fi-Wi Concentrator |
|---|---|---|
| Global CSI visibility | ❌ Each AP sees only local channel; no cross-AP interference data | ✅ Concentrator receives CSI from all RRHs; computes spatial correlation matrix |
| Cross-AP coordination state | ❌ Cannot observe other APs' band selection, power levels, or scheduling decisions | ✅ Centralized scheduler has complete visibility of all RRH configurations and decisions |
| Queue observability | ❌ Queue depth hidden in firmware; sojourn time not exposed | ✅ Centralized queuing with microsecond-resolution timestamps |
| Deterministic replay | ❌ Cannot reproduce exact RF conditions; firmware decisions opaque | ✅ Complete event log enables replay of scheduling decisions and outcomes |
| Inference distance | ❌ High (5-10 steps from cause to transport-layer effect) | ✅ Low (1-2 steps; queue → schedule → TX outcome directly linked) |
This observability gap is not a vendor implementation issue—it is an architectural limitation. Autonomous APs cannot generate high-quality training labels without queue observability.
The preceding sections established the architecture of the Fi-Wi concentrator: centralized packet memory (Section 4.4), group queues as the sole AQM bottleneck (Section 4.3), microsecond timestamps written into the Fi-Wi shim header (Section 4.2), and ML-driven MCS selection running continuously against that centralized data (Section 15). This section explains how the concentrator executes that pipeline with the determinism the architecture requires — maintaining a single observable bottleneck per airtime domain, applying ECN marks at the right moment, and keeping the RRH free of scheduling logic.
The Fi-Wi concentrator's latency and determinism targets strongly favor a kernel-bypass data plane. A conventional interrupt-driven kernel path would reintroduce jitter at exactly the point where the architecture is trying to remove it.
L4S requires ECN marks to be applied at the group queue on the same time
scale as a single 802.11 TXOP. The Linux kernel's
softirq-based packet path introduces interrupt coalescing and
scheduler contention that accumulates across bursts. More fundamentally:
every packet that transits the kernel stack competes with arbitrary OS
activity for CPU time. The queue depth is not directly visible to
userspace without a syscall; the marking decision cannot be co-located
with the queue measurement in the same cache line.
Fi-Wi's concentrator data plane therefore runs via DPDK (Data Plane Development Kit): tight busy-poll loops on dedicated cores, with no interrupt-driven jitter. All packet operations — receive, classify, AQM mark, forward — execute in a cache-resident loop that preserves the single-bottleneck, fully-observable queue structure that the rest of the architecture depends on.
DPDK allocates all packet buffers (mbufs) from hugepages, eliminating TLB misses during packet processing. Each airtime domain's group queue is a logically contiguous region within this space. The pool is allocated once at startup; no per-packet memory allocation occurs on the fast path.
Each SFP+ NIC is bound to the vfio-pci driver. The system
IOMMU enforces DMA isolation: a card can only reach the memory regions
explicitly registered with it at startup. This gives the concentrator two
properties simultaneously:
rx_burst and
tx_burst are zero-copy: the NIC DMA engine writes received
frames directly into pre-registered mbuf space and reads transmit frames
from the same space, with no per-packet kernel involvement.
DPDK exposes each NIC's hardware receive queues independently. Fi-Wi uses this to achieve a direct, lockless mapping from PCIe port and queue index to airtime domain — the same logical grouping described in Section 6. Each lcore owns a fixed set of (port, queue) pairs. Because ownership is exclusive, there are no locks on the fast path and no shared state between lcores during steady-state forwarding.
| Fast-Path Property | Kernel Stack | Fi-Wi DPDK Pipeline |
|---|---|---|
| Receive and Queue Observability | ||
| Interrupt model |
Hardware IRQ → softirq → NAPI poll; coalescing adds
jitter
|
No interrupts. Dedicated lcore polls hardware queue register directly. |
| Queue depth visibility | Visible inside kernel only; userspace access requires syscall | Directly readable by AQM loop in same CPU cache line as packet pointer |
| Buffer allocation | Per-packet skb allocation from kernel slab |
Pre-allocated mbuf pool; zero allocation on fast path |
| AQM and Forwarding | ||
| ECN marking timing | Marked in kernel qdisc; subject to scheduling lag |
Marked in polling loop body; co-located with queue measurement |
| Forwarding lookup | Routing table + netfilter traversal | (port, queue_id) → group queue index; O(1), cache-hot |
| Packet copy | Typically 1–2 copies through socket buffer chain | Zero copies; mbuf pointer passed through the pipeline |
| Transmit | ||
| IOMMU interaction | Kernel maps and unmaps DMA regions per packet | IOMMU mapping established once at pool creation; static thereafter |
The AQM marking step is deliberately minimal. The DPDK data plane does not run a full queue scheduler — that is the outer control loop's responsibility (Section 5). The inner loop does one thing: read sojourn time from the shim header (Section 4.2) and set the ECN CE codepoint if the threshold is exceeded.
// Per-packet in the rx → tx burst loop:
uint64_t sojourn_ns = now_tsc() - pkt->t_ingress;
if (sojourn_ns > THRESHOLD_NS) {
rte_ipv4_l4s_mark(pkt); // in-place, no copy
fiwi_meta(pkt)->ecn_flags |= ECN_CE_APPLIED;
}
rte_eth_tx_burst(out_port, queue_id, &pkt, 1);
Because t_ingress is written by the same lcore at enqueue, no
cross-core communication is needed to compute sojourn time at dequeue. The
marking decision is local to the polling thread. This is what Section 4.3
means when it says AQM runs "exactly where the integrator lives": the
integrator is the group queue, the group queue is an mbuf ring in hugepage
memory, and the marking loop touches that ring on every poll cadence with
no additional indirection.
In a multi-card concentrator, each SFP+ card appears in its own IOMMU group, which means each card can be bound to VFIO independently and the IOMMU enforces that one card's DMA cannot reach another card's memory regions. In a deployment with multiple SFP+ cards, the IOMMU topology provides natural fault isolation at the card boundary: a PCIe error or runaway DMA event from one RRH is contained within its card's group and cannot corrupt the packet memory of an adjacent airtime domain. This is a hardware guarantee, not a software policy.
The kernel-bypass data plane is not a complexity cost — it is the mechanism that justifies the RRH's simplicity. Because the concentrator runs a deterministic, observable pipeline that applies AQM, tracks sojourn time, and manages all descriptor posting without OS intervention, the RRH never needs to make a queuing or scheduling decision. It remains a pure DMA client, exactly as the silicon cost argument in Section 4.4 requires.
Incumbent distributed APs have no equivalent. Because each AP operates
autonomously, it must run its own Linux network stack, its own
qdisc, and its own firmware scheduler. The CPU carrying that
stack is the dominant gate cost per RRH (Section 4.4, silicon cost table).
A centralized DPDK pipeline eliminates that requirement across every RRH
simultaneously — not by optimizing the AP implementation, but by removing
the architectural condition that forces the CPU to exist there in the
first place.
That said, DPDK solves a specific problem: it gives the concentrator a deterministic, observable, zero-copy execution path in which queue state, ECN marking, and packet steering remain under unified software control. It does not solve the radio-side interface. Per-packet MCS selection, EDCA parameter control, and TX-outcome metadata from the Wi-Fi silicon remain the next required interface boundary — the point at which concentrator intelligence must reach into the RRH to close the control loop. DPDK is the precondition; radio-side per-packet programmability is what completes it.
Section 16.4 described the minimal ECN marking step — reading queue state and applying a CE mark in the fast path. That sketch is sufficient to illustrate where marking occurs, but it elides the control structure that makes L4S coexistence with legacy traffic work: the dual-queue coupled AQM defined in RFC 9332.
This section defines the baseline DualPI2 control law as it would be realized inside the DPDK polling loop. Fi-Wi preserves this dual-queue topology, coupling mechanism, and PI-based control structure, but Section 17 replaces the underlying congestion signal with Airtime Debt (Di), grounding the controller in predicted wireless service time rather than raw queue occupancy.
Each airtime domain maintains two logically independent mbuf rings in the concentrator's hugepage pool: an L4S queue for scalable congestion-control flows (senders marking with ECT(1)), and a Classic queue for legacy RFC 3168 flows and unmarked traffic. Classification happens at ingress on the fast path, before the packet is enqueued, and costs a single bitfield check on the IP ECN field:
// Ingress classification — per-packet, inline in the rx burst loop
uint8_t ecn = (pkt_ip->type_of_service & 0x03);
bool is_l4s = (ecn == 0x01 || ecn == 0x03); // ECT(1) or CE — scalable sender
fiwi_meta(pkt)->queue_class = is_l4s ? QUEUE_L4S : QUEUE_CLASSIC;
enqueue_to_domain(pkt, domain_id, fiwi_meta(pkt)->queue_class);
Both queues drain toward the same transmit burst for that airtime domain. The scheduler services the L4S queue with a strict low-latency budget and the Classic queue at a rate that saturates the domain's aggregate share, matching the DualPI2 service model from RFC 9332.
The key property of DualPI2 is that the two queues are not independent.
The Classic queue's drop probability pc — computed by
a PI controller from a congestion signal representing pressure at the shared
bottleneck — also governs the L4S queue's ECN marking probability via a
coupling factor k (default 2 in the Linux
sch_dualpi2 reference implementation).
// Outer control loop — runs on a slow timer cadence (~16 ms), same lcore,
// non-preemptive. Not per-packet.
double signal_classic = ewma_update(&domain->classic_signal,
ring_depth(QUEUE_CLASSIC));
double p_c = max(0.0, K_PI * (signal_classic - TARGET_CLASSIC)); // PI controller
double p_l = COUPLING_K * p_c; // Coupled L4S marking probability
// Applied per-packet in the L4S dequeue path:
double p_l_step = (sojourn_L4S_ns > THRESHOLD_L4S_NS) ? 1.0 : p_l;
if (rte_rand_u64() < (uint64_t)(p_l_step * (double)UINT64_MAX))
rte_ipv4_l4s_mark(pkt); // Set ECN CE in-place, no copy
In a conventional queue-based implementation, signal_classic
would be an EWMA of Classic queue depth. In Fi-Wi, that queue-derived signal
is replaced as the PI controller input by
Airtime Debt (Di), a forward estimate of wireless
service time. The DualPI2 control law, coupling mechanism, and
dual-queue topology remain unchanged; only the input signal changes.
Queue depth is a lagging indicator in Wi-Fi because contention, retries, and variable PHY rates consume airtime without necessarily appearing in buffer occupancy. Airtime Debt provides a forward-looking signal that better matches the true wireless bottleneck while preserving the DualPI2 coexistence structure required for L4S and Classic traffic to share the medium.
Each airtime domain carries its own DualPI2 state alongside the
fiwi_rrh_state struct (Section 17.5). Because each lcore
owns a fixed set of domains exclusively (Section 16.8), this state is
never shared across cores — no locks, no atomics, no cache-line bouncing on
the fast path.
The telemetry path (Section 17.8) delivers ground-truth airtime
measurements back to the lcore via a lockless ring carrying
fiwi_update objects. The struct is defined here because it
originates in the DPDK fast-path layer and is consumed by it;
Section 17.8 populates it from Netlink/vendor telemetry events:
/**
* fiwi_update — telemetry record posted by the Netlink callback,
* consumed by the DPDK lcore during its scheduling loop.
* Allocated from fiwi_update_pool (rte_mempool); returned after use.
*/
struct fiwi_update {
uint8_t type; /* AIRTIME_RECONCILE (only type currently defined) */
uint32_t rrh_id; /* RRH index, validated < FIWI_MAX_RRHS before enqueue */
uint64_t actual_us; /* Hardware-path-to-status interval (ground truth) */
uint64_t expected_us; /* Forward estimate: T_phy + T_agg at enqueue time */
uint32_t retry_us; /* Observed retry airtime from telemetry metadata */
};
RING_F_MP_HTS_ENQ because the Netlink callback runs on a
non-EAL thread; the lcore-side dequeue uses
RING_F_SC_DEQ (single consumer).
The Umber concentrator runs on a workstation-class host with a Threadripper PRO processor and multiple PCIe-connected RRHs. This section describes how DPDK lcore assignments map onto that hardware topology to preserve cache locality, single-writer semantics, and deterministic fast-path execution.
Each lcore owns both the DualPI2 control state (Section 16.7) and the Airtime Debt estimator (Section 17) for its assigned RRHs. This ensures that congestion estimation, scheduling, and ECN marking operate within a single execution context.
| RRH Range | Assigned lcore | Airtime Domains |
|---|---|---|
| 0–3 | lcore 2 | domains 0–3 |
| 4–7 | lcore 4 | domains 4–7 |
| 8–11 | lcore 6 | domains 8–11 |
| 12–15 | lcore 8 | domains 12–15 |
| 16–19 | lcore 10 | domains 16–19 |
| 20–23 | lcore 12 | domains 20–23 |
Each RRH lcore applies its per-domain DualPI2 loop as described in Section 16.7, with Airtime Debt (Di) serving as the PI controller input in place of queue depth. This presents a single, airtime-grounded congestion signal per domain to the L4S control loop.
Downlink traffic is classified at ingress and directed to the appropriate airtime domain. The owning lcore performs scheduling, ECN marking, and transmission. Uplink traffic follows the reverse path toward the WAN interface.
Because each lcore exclusively owns its RRHs and associated Airtime Debt state, congestion estimation, scheduling, and ECN marking operate without cross-core coordination. This preserves deterministic fast-path behavior.
Fi-Wi does not infer congestion from queue depth alone. The bottleneck is the wireless medium, and the relevant state variable is the time required to successfully transmit packets over that medium. The system replaces the queue sojourn-time inputs of traditional PI2 controllers with Airtime Debt (Di), converting a stochastic medium into a controlled service process.
In traditional L4S systems, ECN marking is derived from queue sojourn time, which assumes a stationary service rate. These assumptions fail in Wi-Fi because service time varies per client based on PHY rates, contention, and retries. Fi-Wi replaces backward-looking buffer metrics with a forward model of wireless service time. The Concentrator maintains this model continuously and makes scheduling decisions on predicted service outcomes, not observed queue growth. This approach provides the AQM with a signal that has a more stationary distribution than raw queue depth over a variable-rate medium, improving marking coherence and L4S stability.
For each RRH (i), the Concentrator maintains a real-time
Airtime Debt (Di):
The "Ground Truth" for airtime consumption is measured as the interval
from
descriptor posting into the hardware transmit path to
TX Status (hardware completion signal via
driver/vendor-specific telemetry events such as mt76 TX
status reports). This interval captures the full service duration,
including the full wait for TXOP eligibility (AIFS + backoff),
aggregation delay, and all hardware-level retransmission attempts.
For any packet, the Predicted Sojourn Time (Si) is a forward estimate of delivery time:
The Tservice calculation is decomposed into:
Tagg (aggregation hold time) +
Tphy (modulation time at current MCS) +
Tretry (statistical retry overhead). This
estimate is packet- and client-specific; it is not a constant service
quantum.
The Concentrator tracks RRH state in hugepage-backed memory. The DPDK
lcore is the sole writer of fiwi_rrh_state; telemetry
updates are applied via per-RRH lockless ring buffers to preserve
single-writer semantics and microsecond-level determinism.
struct __rte_cache_aligned fiwi_rrh_state {
uint32_t rrh_id;
uint64_t D_i; /* Total airtime debt (A+C+R) */
/* Component Estimates (microseconds) */
uint64_t A_i; /* Total scheduled airtime (queued + in-flight) */
uint32_t C_i; /* Estimated contention delay */
uint32_t R_i; /* Estimated retry penalty */
/* Feedback & Synchronization */
uint64_t last_update_us; /* Timestamp of last lcore application */
uint64_t last_tx_status_us; /* TSC of last hardware completion */
uint32_t moving_avg_per; /* Recent PER (Section 15.4) */
};
Di is recomputed in the DPDK fast path after
each update to Ai, Ci, or Ri. The loop updates Ai when packets are assigned
to an RRH and decrements it upon TX completion using telemetry feedback.
Airtime Debt replaces physical queue depth as the authoritative input for the Dual-Queue AQM, providing a single, authoritative congestion signal across all RRHs without relying on a shared physical buffer.
Si > Tlow. This bypasses traditional sojourn measurements to signal congestion
at the true wireless bottleneck.
Di) replaces queue depth as the input to the PI controller defined in
Section 16.7. This preserves the Dual-Queue AQM structure while
grounding the control signal in predicted wireless service time rather
than buffer occupancy.
While Di provides fast-path control, the system
monitors
Airtime Utilization (Uair = ΔTX_DURATION /
Δt)
as a slow-path observability metric. This metric is used to identify
external interference patterns and long-term capacity shifts in the
airtime domain, calibrating the confidence weights applied to the
Ci and Ri estimators.
The following logic processes TX_STATUS events from the
mt76 driver. Completion data is retrieved from a
pre-allocated mempool and posted to a per-RRH lockless ring to reconcile
state without lcore contention.
/* Telemetry Path (Netlink Callback) */
static int fiwi_handle_mt76_telemetry(struct nl_msg *msg, void *arg) {
struct nlattr *attrs[MT76_ATTR_MAX + 1];
nla_parse(attrs, MT76_ATTR_MAX, genlmsg_attrdata(nlmsg_data(nlmsg_hdr(msg)), 0),
genlmsg_attrlen(nlmsg_data(nlmsg_hdr(msg)), 0), NULL);
if (!attrs[MT76_ATTR_TX_DURATION] || !attrs[MT76_ATTR_RRH_ID])
return NL_SKIP;
uint32_t rrh_id = nla_get_u32(attrs[MT76_ATTR_RRH_ID]);
if (rrh_id >= FIWI_MAX_RRHS) return NL_SKIP;
struct fiwi_update *update;
if (rte_mempool_get(fiwi_update_pool, (void**)&update) < 0) return NL_SKIP;
update->type = AIRTIME_RECONCILE;
update->rrh_id = rrh_id;
update->actual_us = nla_get_u64(attrs[MT76_ATTR_TX_DURATION]);
update->retry_us = nla_get_u32(attrs[MT76_ATTR_RETRY_DURATION]);
update->expected_us = estimate_service_time(msg);
rte_ring_enqueue(rrh_update_rings[rrh_id], update);
return NL_PROCEED;
}
The DPDK lcore closes the control loop by draining the update ring. It decrements the backlog and calibrates penalties to ensure the Airtime Debt remains an accurate representation of physical medium pressure.
/* DPDK lcore: apply telemetry updates */
static inline void
fiwi_apply_updates(struct fiwi_rrh_state *rrh, struct rte_ring *ring)
{
struct fiwi_update *upd;
while (rte_ring_dequeue(ring, (void**)&upd) == 0) {
/* 1. Discharge processed backlog */
rrh->A_i = (rrh->A_i > upd->actual_us) ? (rrh->A_i - upd->actual_us) : 0;
/* 2. Update contention estimate (drift from expected modulation time) */
uint32_t drift = (upd->actual_us > (upd->expected_us + upd->retry_us)) ?
(upd->actual_us - upd->expected_us - upd->retry_us) : 0;
rrh->C_i = (rrh->C_i * 7 + drift) >> 3;
/* 3. Update retry penalty */
rrh->R_i = (rrh->R_i * 7 + upd->retry_us) >> 3;
/* 4. Recompute total Airtime Debt (D_i) */
rrh->D_i = rrh->A_i + rrh->C_i + rrh->R_i;
rrh->last_tx_status_us = rte_get_tsc_cycles();
rte_mempool_put(fiwi_update_pool, upd);
}
}
Figure 17-1: The Fi-Wi recursive control loop for stabilizing stochastic wireless service.
Figure 17-1 synthesizes the technical components of the Airtime Debt model into a continuous functional loop. The architecture separates the Speculative Forward Path (Fast Path) from the Calibrated Feedback Path (Telemetry Path).
Tservice. This is not a global constant; it is a client-specific sum of aggregation hold time (Tagg), PHY modulation time (Tphy), and predicted retry overhead (Tretry) based on that STA's specific RF context.
Tservice is added to the RRH's Ai (Backlog). If the resulting Predicted Sojourn Time (Si) exceeds Tlow, an ECN CE mark is applied immediately in the DPDK fast path. This provides the "Virtual Backpressure" that stabilizes L4S senders.
Ci (Contention) and Ri (Retries). This ensures that subsequent predictions for the same STA or RRH domain are corrected for changing medium pressure, effectively regularizing the stochastic nature of the 802.11 medium.
The core idea of Umber’s Fi-Wi architecture is to make a building full of Wi-Fi radios behave like a large number of predictable, low-latency, cellularized bottlenecks (often cell-per-room) that integrate cleanly with L4S, and to avoid Wi-Fi collapse in the regime that matters most for users: tail latency.
We do that by:
Compared to a building filled with independent APs, Fi-Wi provides:
This appendix explains the precise behavior of the 802.11 CSMA/CA backoff algorithm, why the freeze/resume mechanics create strong nonlinearities under load, and how this drives the collapse behavior discussed in Sections 2 and 6. We also include reference diagrams, accurate pseudocode, and probability scaling that shows why birthday-paradox collisions appear long before PHY saturation.
The 802.11 MAC is built around two core mechanisms:
When a station has a frame to send, it chooses a random integer:
B ← Uniform[0, CW]where
CW is the contention window. The counter
decrements only when:
If any of these conditions break during a SlotTime boundary, backoff does not decrement.
Time → ───────────────────────────────────────────────────────────────────────→
Channel: Busy TXOP Idle slot Idle slot Busy TXOP Idle ...
────────────┐ ┌─────────┐ ┌─────────┐ ┌───────────┐
│ │ slot OK │ │ slot OK │ │collision │
└──┘ └───┘ └──────────────┘
Backoff B: [frozen] B:=B-1 B:=B-2 [frozen] B:=B-3
This "idle-slot-only" decrement rule is the source of nonlinear timing behavior.
The backoff counter freezes immediately under either condition:
NAV counts down in microseconds, not slot units, so a NAV may span dozens or hundreds of SlotTimes, creating long frozen periods.
Frame overheard with Duration=480µs
NAV := 480 µs ─────────────────────────────────────────────▶ 0 µs
Backoff:
Frozen until NAV==0
Then: AIFS idle interval → first idle SlotTime → resume B countdown
The following pseudocode describes the real 802.11 backoff and retry machine:
# Variables
B = random integer in [0, CW]
CW = CWmin initially, doubled on failures
NAV = virtual carrier sense (µs timer)
Slot = 9 microseconds (typical)
AIFS = access category-specific inter-frame space
while True:
wait_until( medium_idle() and NAV == 0 )
wait(AIFS) # must see idle for entire AIFS
# Backoff countdown
while B > 0:
if medium_idle() and NAV == 0:
wait(Slot)
if medium_idle() and NAV == 0:
B -= 1 # decrement only if entire slot was idle
else:
# Freeze B until another idle AIFS appears
wait_until( medium_idle() and NAV == 0 )
wait(AIFS)
# Backoff fully expired, attempt TX
transmit()
if ack_received():
CW = CWmin
B = random(0, CW)
else:
CW = min(2 * CW, CWmax)
B = random(0, CW)
The critical detail: multiple stations freeze and resume their counters in lock-step after every long TXOP or NAV, making collisions statistically inevitable as station count grows.
Each station independently picks a backoff slot in [0, CW].
The probability that no two stations choose the same slot is:
P(no collision) = (CW+1)! / [(CW+1 - n)! · (CW+1)^n]
where n = number of active contenders. Therefore:
Stations (n) → 4 6 8 10 12 16 -------------------------------------------------------- P(collision) ~12% 30% 48% 65% 78% >90% (CWmin = 15)
This is the MAC-level reason collapse begins long before PHY capacity is reached.
Once collisions become frequent:
Healthy: T50 ≈ 200–500 µs, T95 < 0.8 ms, T99 < 1.2 ms Degraded: T95 = 1–2 ms, T99 = 2–3 ms Collapsed: T95 > 2 ms AND T99 ≥ 3 ms (dominant channel monopolization)
A single 3 ms TXOP already violates the bottleneck-delay budget required by L4S (≈250–300 µs). With multiple stations taking such TXOPs, service gaps can reach 10–50 ms for unlucky flows.
The following diagram illustrates how multiple stations become phase-aligned:
Time → ────────────────────────────────────────────────────────────────→ TXOP1 by STA-A: ──────────────── NAV for others: ──────────────── (all B frozen) After NAV expires: All stations wait AIFS → begin countdown Slot 1: B_A=2, B_B=4, B_C=2 Slot 2: B_A=1, B_C=1 Slot 3: B_A=0 , B_C=0 → simultaneous transmit → collision
This synchronization is why the birthday paradox applies so strongly in Wi-Fi.
Fi-Wi removes the “every station fends for itself” randomness by:
This appendix describes how Fi-Wi can use Channel State Information (CSI) from each RRH, together with learning models (e.g. LSTM or TCN), to improve grouping, scheduling, redundancy, and control beyond what is possible with queue-based feedback alone.
Modern 802.11 chipsets can export CSI per subcarrier or per resource unit: complex-valued estimates of the channel between an RRH and a station (STA). In a Fi-Wi deployment, each RRH periodically reports:
Thanks to centralized time synchronization and packet memory, the concentrator can align CSI reports with:
This gives Fi-Wi a rich per-domain, per-STA time series:
Using this data, Fi-Wi can learn models to help answer questions such as:
These predictions can feed directly into:
One reasonable approach is to use a sequence model such as an LSTM or Temporal Convolutional Network (TCN) per airtime domain:
Input features (per timestep):
- queue depth q_k
- marking probability p_k
- throughput, PER, retries
- per-RRH CSI summary (e.g. dominant eigenvalues/eigenvectors)
- beacon power settings, channel, bandwidth
Outputs:
- predicted effective capacity C_eff,k+1
- predicted collapse risk score
- recommended group reconfiguration / beacon adjustments (optional)
A higher-level policy layer then uses these predictions to:
The key point is that Fi-Wi has access to the joint state across all RRHs—queues, CSI, MAC outcomes, and beacon configuration—so learning can be done on a true building-scale view rather than a per-AP snippet.
While the PI² controller (Section 5.2) provides a robust baseline using linear control theory, the wireless medium is inherently non-linear. A small drop in SNR can cause a discrete, non-linear step-down in MCS, cutting capacity by half in microseconds. A linear controller often reacts too slowly to these step-changes.
Because the Concentrator terminates both the MAC (Inner Loop) and L4S (Outer Loop), it possesses a complete, global view of the system state. This allows Fi-Wi to implement a Non-Linear Marking Signal derived from a rich real-time feature vector:
Feature Vector x(t) = [ MCS_t, // Current Modulation (Capacity potential) PHY_Rate_t, // Raw drain rate RTT_outer, // End-to-end latency (Sojourn + Flight) Q_depth_t, // Current backlog d_arrival/dt // Arrival rate gradient (ARM Policer) ]
Optimization Objective: Efficiency vs. Latency
The system uses this vector to solve the fundamental Wi-Fi trade-off:
Aggregation Efficiency vs. Serialized Latency.
This creates a Non-Linear Marking Signal that optimizes Throughput per Microsecond of Latency, rather than simply targeting a fixed queue depth.
Early architectural models of C-RAN often assumed a "Store-and-Forward" approach, where full packets must be buffered at the edge to meet timing. Fi-Wi eliminates this inefficiency by leveraging the natural physics of the 802.11 air interface. We utilize a Scatter-Gather DMA engine with Preamble Hiding to enable a "Thin RRH" design with minimal local SRAM.
The critical timing constraint in Wi-Fi is the transition from "Decision to Transmit" to "Energy on Air." However, the 802.11 PHY does not transmit user data immediately. Every transmission begins with a PHY Preamble (PLCP) and MAC Headers.
The Insight: The transmission of the Preamble and Headers takes roughly 20–40 µs (depending on PHY generation). The round-trip time to fetch payload data over 100m of PCIe-over-Fiber is roughly 2–5 µs.
Consequently, the fetch latency is completely "hidden" behind the transmission of the headers. The payload data arrives at the RRH's small FIFO well before the PHY is ready to modulate it.
Instead of a large packet buffer, the Fi-Wi RRH implements a Scatter-Gather DMA engine that composes frames on the fly from two distinct memory regions:
A common objection to C-RAN is the SIFS deadline (16 µs) required for retries. If a transmission fails, the station must retransmit immediately.
With Scatter-Gather, the RRH does not need to buffer the packet for retries. If a NACK occurs, the MAC simply resets the Scatter-Gather engine. It re-transmits the Preamble (from Local RAM) while re-issuing the DMA fetch (from Remote RAM). Because the fiber latency (5 µs) is significantly shorter than the SIFS + Preamble duration, the data again arrives in time.
Modern Wi-Fi standards — particularly 802.11ax (Wi-Fi 6/6E) and 802.11be (Wi-Fi 7) — introduce features that appear to address some of the same problems as Fi-Wi: uplink scheduling, spatial reuse, and multi-AP coordination. This appendix clarifies how these features relate to Fi-Wi's architecture, where they're complementary, and why they don't eliminate the need for Fi-Wi's centralized data-plane approach.
Key takeaway: 802.11ax/be features like trigger frames and multi-AP coordination are valuable enhancements that Fi-Wi can leverage when client support is available, but they operate at a different architectural level (per-AP MAC features vs. building-scale data-plane unification) and cannot replace Fi-Wi's core innovations: centralized queues, shared state, L4S marking coordination, and dynamic RF grouping across the entire building.
802.11ax introduced trigger frames (TF) to enable centralized uplink scheduling. Instead of clients contending for the channel using stochastic EDCA backoff, the AP sends a trigger frame that grants specific clients permission to transmit on specific OFDMA resource units (RUs) or spatial streams at a specific time.
What trigger frames provide:
How trigger frames align with Fi-Wi:
Trigger frames match Fi-Wi's philosophy of centralized scheduling rather than distributed contention. In a Fi-Wi deployment where RRHs support 802.11ax and clients support uplink OFDMA/MU-MIMO, the concentrator can:
Reality check — client support in 2025:
While 802.11ax was ratified in 2019, uplink OFDMA support remains inconsistent. Crucially, trigger frames only control 802.11ax/be clients; legacy devices (iPhone 11, older IoT) are invisible to this schedule. These legacy clients cannot parse the trigger, so they continue to contend via random EDCA, acting as unmanaged interference sources. In contrast, Fi-Wi's reception diversity (Section 8.1) enhances uplink reliability for all clients, regardless of generation, by combining signals from multiple RRHs.
A natural question: "If 802.11ax APs can use trigger frames for uplink scheduling, why do we need Fi-Wi's centralized architecture?"
Answer: Trigger frames address only a small subset of the problems Fi-Wi solves, and even for uplink scheduling, they provide per-AP control, not building-scale coordination.
What trigger frames do NOT provide:
802.11ax OFDMA subdivides a channel into resource units (RUs). In Fi-Wi, an airtime domain is a logical entity representing a shared RF resource. OFDMA RUs provide finer-grained subdivision of that airtime resource.
Conceptually:
This does not change the fact that all RRHs in that airtime domain share a single group queue and marking point. It simply allows the service process to be more efficient.
802.11ax BSS coloring allows STAs to distinguish between intra-BSS frames (same color) and inter-BSS frames (different color), enabling more aggressive spatial reuse.
Relationship to Fi-Wi RF grouping: Fi-Wi's dynamic RF grouping (Section 6) serves a similar but more sophisticated purpose. Fi-Wi uses richer information (CSI, retry statistics, airtime) to decide grouping, not just RSSI thresholds. In a Fi-Wi deployment, the concentrator can assign BSS colors to RRHs strategically: RRHs in the same airtime domain get the same color, while isolated domains get different colors.
802.11be (Wi-Fi 7) introduces multi-AP coordination features that appear to move in Fi-Wi's direction:
How these relate to Fi-Wi: These features acknowledge the problem of autonomous APs but approach it incrementally. 802.11be uses distributed AP-to-AP messaging, which limits scale and speed. Fi-Wi centralizes the data plane, enabling deeper coordination than distributed messaging can achieve.
A key advantage of Fi-Wi's architecture is that it degrades gracefully with mixed client populations and doesn't require forklift client upgrades.
Client capability tiers in a 2025 deployment:
Deployment strategy:
802.11ax and 802.11be introduce valuable features — trigger frames, OFDMA, BSS coloring, multi-AP coordination — that align with Fi-Wi's centralized control philosophy and can enhance Fi-Wi deployments when clients support them. However:
In short: 802.11ax/be features make Fi-Wi better, but Fi-Wi solves problems these standards cannot address within the constraints of the distributed-AP model. Fi-Wi is not "better APs" — it's a different architecture that happens to integrate well with modern Wi-Fi standards as they evolve.
Unlike software, ASICs cannot easily “refactor away” unused features. Removing blocks typically requires re-verifying entire subsystems, while adding blocks often requires verifying only the new logic. This asymmetry encourages accumulation:
Over many product generations, this leads to RTL codebases that only grow. Legacy modulation modes, preambles, power-save FSMs, calibration paths, and debug hooks persist long after their practical value has disappeared.
This accumulated complexity has tangible costs:
Fi-Wi’s architecture separates the system into:
This separation dictates where complexity must live. RRHs implement only what must be fast and deterministic: RF front end, PHY processing, minimal MAC TX/RX, DMA, PTP synchronization, and PCIe-over-fiber transport. All high-level behavior (queueing, L4S policy, aggregation strategy) lives in the concentrator.
For a modern Wi-Fi chip at an advanced node, even a modest reduction in unnecessary logic can translate into significant savings: smaller die, lower power, simpler verification, and faster time to market.
The guiding principle for Fi-Wi RRH design is:
Complexity belongs in the concentrator; only latency-critical functions belong in RRH silicon.
Concretely, this means: no autonomous AP queueing/scheduling logic, no legacy PHY/MAC support beyond what Fi-Wi needs, and no embedded firmware CPU managing per-station behavior at the edge.
To truly understand Fi-Wi, we must follow a single packet through the system at the microsecond scale. This narrative illustrates how the Workstation Concentrator (Section 13) and the Scatter-Gather RRH (Appendix C) collaborate to trick the physics of latency.
T = 0 µs (Arrival): The video packet arrives at the Concentrator's NIC. The CPU timestamps it immediately.
T = 2 µs (The Decision): The Concentrator's software scheduler inspects the packet.
T = 10 µs (The Setup): The scheduler posts a
DMA Descriptor to RRH-A via PCIe.
Note: The payload data (1500 bytes) stays in the Concentrator. Only a
16-byte pointer moves to the edge.
T = 50 µs (The Trigger): RRH-A's LBT logic sees the airtime is clear. It begins the transmission sequence. This is where the magic happens:
T = 52 µs (The Fetch): The Read Request hits the Concentrator's PCIe controller. Because of the 92-lane non-blocking fabric (Section 13), there is zero switching delay.
T = 55 µs (The Return): The payload data flies back down the fiber.
T = 58 µs (The Handover): The payload data arrives at RRH-A's FIFO. The PHY is just finishing the last symbol of the Preamble.
T = 59 µs (Seamless Serialization): The PHY seamlessly switches from transmitting the Preamble to transmitting the payload. To the air, it looks like one continuous stream. The 200-meter fiber latency effectively vanished because it was hidden behind the mandatory PHY training sequence.
T = 200 µs: Alice sends a TCP ACK.
T = 204 µs (The Multi-Stat): Both RRH-A and RRH-B hear the ACK.
T = 210 µs (The Race Up): Both RRHs push the packet + CSI metadata to the Concentrator.
T = 215 µs (The Deduplication): The Concentrator sees two copies of Sequence #104. It discards the weak one from RRH-B but keeps the CSI data to update the "Sensing Model" (detecting that someone is standing near RRH-B, blocking the line of sight).
If this were a traditional AP:
RRH Failure: If RRH-A fails during the prefetch (e.g., power loss), the concentrator detects the link loss immediately via PCIe link state. Because the packet payload never left Concentrator DRAM, the scheduler simply re-posts the descriptor to RRH-B. No packet is lost, and TCP does not see a drop.
Congestion: The scatter-gather pipeline depth allows the Concentrator to queue up the next descriptor while the current one is transmitting. This allows back-to-back TXOPs (SIFS spacing) without idle gaps on the air, even with the fiber latency.
Coordinated Transmission: The Concentrator can schedule RRH-A and RRH-B to transmit concurrently to spatially separated clients. It analyzes the CSI matrix to determine if spatial isolation is sufficient (>25 dB cross-coupling attenuation). If yes, both RRHs transmit simultaneously using standard 802.11 frames. If interference is detected, the Concentrator schedules sequential TXOPs. This dynamic decision happens per-packet based on real-time CSI.
From the packet's view, Fi-Wi provides uplink diversity, per-flow fair queuing, accurate ECN marking, and speculative DMA that hides PCIe latency. The packet experiences the network as a transparent, zero-wait pipe.
Fi-Wi separates timing (RRH hardware) from intelligence (Concentrator software), bridged by the speculative DMA prefetch pipeline. This allows the hardware to meet strict microsecond deadlines while the software retains the flexibility to run complex scheduling, L4S, and spatial multiplexing logic.
The upfront cost of installing fiber is often the primary friction point for C-RAN adoption ("The Fiber Tax"). However, this framing ignores the physics of modern signaling and the macroeconomics of construction. Fi-Wi's reliance on fiber is not a tax; it is a strategic asset conversion.
We are hitting a hard physical limit with copper cabling. At modern data center speeds (100Gb/s), signal loss in copper is so high it is characterized in dB per inch.
In low-voltage construction, the cost of cabling is dominated by labor (often 70-80%), not material.
Unlike HDMI or Copper Ethernet—which are purpose-built cables engineered for a single generation—fiber is a raw transport medium. It is a "pipe for light" that supports Ethernet, DWDM, and PCIe-over-Fiber simultaneously.
While cable standards have cycled (Cat5e → Cat6 → Cat6A), they remain tethered to the legacy RJ45 connector. This physical interface is rapidly becoming obsolete. Fi-Wi recognizes that the connection is what matters, not the physical port. In this architecture, the 802.11 wireless interface becomes the new connector. By installing fiber once as a permanent asset and treating Wi-Fi as the universal 'plug' inside the room, the building infrastructure is 'one and done'. This finally breaks the cycle of physical obsolescence.
Fi-Wi's centralized architecture provides observability that is difficult or impractical to achieve in distributed AP systems. This appendix presents the Observability Matrix—a systematic comparison of what telemetry is directly observable, partially observable, or hidden across different measurement approaches. This complete visibility is the prerequisite for effective machine learning (Section 15) and deterministic L4S control.
Traditional Wi-Fi deployments rely on tools that provide only partial visibility into system state. Operators attempt to infer problems from symptoms (latency spikes, ECN marks, throughput degradation) without directly observing root causes (queue growth, retry timing, MCS selection under interference). This inference distance—the number of steps between observable effects and hidden causes—makes control systems less stable and limits the effectiveness of machine learning.
The table below compares observability across six measurement approaches. The legend indicates:
| Telemetry / Metric |
ESP32-C5 RF sensor |
RPi 5 Monitor mode |
RPi 5 L4S node |
tcpdump Packet capture |
iperf2 L4S |
Fi-Wi Concentrator |
|---|---|---|---|---|---|---|
| Energy detect / CCA | ||||||
| Channel busy time | ||||||
| NAV / medium reservation | ||||||
| CSI / channel matrix | ||||||
| MCS / GI / NSS | ||||||
| PER / retry counts | ||||||
| RSSI / SNR | ||||||
| Queue depth | ||||||
| Sojourn time | ||||||
| ECN marks | ||||||
| One-way delay (OWD) | ||||||
| Responsiveness | ||||||
| Throughput / goodput | ||||||
| Deterministic playback |
Queue Depth and Sojourn Time (highlighted rows):
These metrics are essential for L4S congestion control and machine learning. Traditional tools (tcpdump, Wi-Fi packet capture) cannot directly observe queue state because it exists inside firmware or kernel layers. While synchronized ingress and egress packet captures could theoretically infer queue depth through timing correlation, this approach requires nanosecond-precise time synchronization across physically separated capture points, perfect packet correlation despite potential losses, and still cannot observe firmware-internal retry queues, aggregation buffer states, or PHY scheduling decisions. External sniffers see the explosion (the packet hitting the air), but they cannot see the fuse burning (the packet sitting in the driver queue). Only centralized queueing architectures expose these values with direct microsecond-resolution timestamps.
MCS / GI / NSS (PHY Configuration):
Monitor-mode packet capture can partially infer MCS from radiotap headers, but this only shows what was transmitted—not the decision process, CSI data, or PER history that informed the choice. The Fi-Wi Concentrator has direct access to the complete decision state.
Deterministic Playback (bottom row):
This capability enables machine learning. Deterministic playback means the Concentrator can reproduce its own decision sequence from a log file: packet arrivals, queue transitions, scheduling decisions, MCS selections, and RRH transmission commands. While actual RF outcomes depend on station behavior and channel conditions that may vary, the Concentrator can replay its control decisions under the logged RF environment to evaluate alternative strategies offline and verify whether different MCS/scheduling choices would have improved performance. This is only possible when all Concentrator-controlled components operate under a single clock with complete state visibility. Distributed systems cannot reconstruct this causal chain from partial packet traces because they lack visibility into queue state, retry logic, and the decision-making process itself.
Section 15 describes how Fi-Wi uses machine learning to optimize MCS transition rates. The observability matrix demonstrates significant practical advantages that Fi-Wi's centralized architecture provides for ML training:
Fi-Wi's centralized architecture provides these observability advantages. The Concentrator's event log becomes a high-quality training dataset where every state transition is labeled with measured outcomes under consistent instrumentation. While autonomous AP systems could attempt ML-based rate adaptation using the partial observability available to them, Fi-Wi's richer telemetry—particularly queue visibility, global CSI, and deterministic replay—enables significantly more effective learning and optimization.
Coordinated AP systems can share summaries (throughput, ECN marks, interference reports) but cannot share hidden internal state (queue depth, firmware retry logic, aggregation decisions). This creates inference distance—the controller sees effects but not causes. Fi-Wi eliminates inference distance by removing autonomous decision-making from the edge. Queues, scheduling, and PHY selection are centralized under a single clock, producing an observable state graph where causes are explicit, replayable, and directly controllable. This architectural difference translates to measurably better ML training data quality.
The Fi-Wi architecture treats channel width as a dynamic control parameter managed by the Concentrator. While 802.11be (Wi-Fi 7) emphasizes 320 MHz peak PHY rates, Fi-Wi's orchestration engine strategically selects 40 MHz channel widths in high-density environments to ensure Service Time Stationarity and the stability of the L4S control loop.
In shared-spectrum MDUs (Multi-Dwelling Units), the theoretical gain of wider channels is often negated by contention-domain collapse. In a CSMA/CA environment, a transmission opportunity (TXOP) requires the entire bonded channel to be idle. In a 6-AP overlapping scenario with 50% aggregate airtime occupancy, the probability of finding all sub-bands simultaneously idle drops exponentially with bandwidth.
Under a simplified independent-sub-band occupancy assumption, a basic
model suggests P(160 MHz idle) ≈ (P(40 MHz idle))^4,
resulting in 4–16× fewer transmission opportunities. In practice,
partial correlation between sub-bands moderates the exponent but does
not eliminate the super-linear decline in idle probability. This leads
to:
From an M/G/1 queueing perspective, the performance of the L4S control loop depends on the stability of the service rate (μ). L4S stability requires frequent service opportunities and low variance in service time to prevent the decoupling of the sender's congestion window from the actual queue state.
Narrower channels reduce the probability that partial-band interference (e.g., unmanaged IoT bursts) forces a full MCS downgrade across the entire bonded width. This allows the Concentrator to maintain stable link adaptation and a predictable drain rate, avoiding the chaotic rate-shifting common in 160 MHz deployments.
Fi-Wi is not anti-wideband; channel width is an orchestrated variable. The system expands width opportunistically when contention is low to leverage PHY gains and contracts it to 40 MHz when deterministic latency is required. This prioritizes spatial reuse and airtime isolation over maximum burst rate—the fundamental technical unlock for Fi-Wi’s cell-per-room model.
Fi-Wi optimizes Capacity Density under a Latency SLO, rather than peak PHY on a single link. In dense OBSS environments, wide channels reduce spatial reuse; narrower channels increase the number of bounded contention domains. Consequently, aggregate goodput per area increases even if per-link PHY decreases.
ρ_LL [Mbps / 1,000 sq ft] = (Σ Goodput_i) / Area | subject to p95 OWD ≤ 20ms
Where Goodput_i is the application-layer payload throughput delivered while maintaining the p95 one-way delay (OWD) constraint. The 20ms threshold reflects the target for interactive L4S applications.
Example Calculation (1,000 sq ft section of a 10,000 sq ft floor):
Assumptions: 50% aggregate offered load per BSS, default EDCA parameters, and no explicit inter-AP coordination in the autonomous case.
To align with a Gigabit-class WAN service, the wireless architecture must match the aggregate wireline supply to orchestrated spatial demand. In a dense MDU, Contention Delay is 10–100× larger than serialization time. A single 160 MHz AP attempting to serve a Gigabit load creates a "fast but flaky" link that collapses under co-channel interference, delivering only a fraction of the ISP's provided capacity to real-time applications.
Fi-Wi resolves this by using 40 MHz orchestration to spread the Gigabit load across N coordinated spatial domains. This ensures that the building-wide wireless fabric can actually saturate a 1 Gbps WAN link with deterministic, multi-user goodput, rather than relying on single-device peak bursts that starve other users and destabilize shared airtime.
L4S signals congestion at Layer 3 (IP ECN), but wideband Wi-Fi operates via massive Layer 2 A-MPDU aggregation to maintain PHY efficiency. This creates a fundamental control-loop mismatch:
The Fi-Wi architecture addresses these challenges through its DualQ implementation (Section 5.2), which maintains separate queues for L4S and Classic traffic and performs per-packet sojourn time measurements at the Concentrator before entering the A-MPDU aggregation pipeline.
Scenario: 2x2 MIMO, 6+ overlapping BSSIDs, shared unlicensed spectrum (5/6 GHz), 50% aggregate offered load, autonomous EDCA parameters. See Appendix J for full simulation parameters.
| Metric | 160 MHz (Autonomous CSMA) | 40 MHz (Fi-Wi Orchestrated) |
|---|---|---|
| Peak PHY Rate (2x2, MCS 11) | ~1.2 Gbps | ~300-400 Mbps |
| Effective Airtime Utilization | <10% (Fragmented TXOPs) | 30–50% (Planned reuse / Bounded domain) |
| Service Time Variance (σ²) | High (Heavy-tailed) | Low (Near-stationary) |
| Queue Service Interval (median) | Tens to >100 ms | 5–15 ms (Stationary) |
| DualQ ECN Feedback Coherence | Sparse / Burst-marked | Continuous / Stable marking |
|
Goodput Density (ρ_LL) (Mbps per 1,000 sq ft) |
~12 Mbps (Overlapping contention domains) |
~128 Mbps (8 RRHs, orthogonal 40 MHz channels) |
Economic Conclusion: Under realistic dense MDU conditions, Fi-Wi's orchestrated 40 MHz architecture delivers ~10× higher usable goodput density compared to autonomous wide-channel deployments. This is the fundamental advantage of Fi-Wi: capacity scales with RRH density and spatial reuse, not channel width alone.
See Appendix J for detailed contention modeling and simulation methodology.
This appendix details the Monte Carlo simulation and analytical models used to derive the Low-Latency Goodput Density (ρ_LL) metrics. The framework evaluates Fi-Wi's spatial capacity gains under realistic Multi-Dwelling Unit (MDU) contention scenarios.
The simulation contrasts traditional wide-area coverage with Fi-Wi's localized orchestration.
PL(d) = PL(d₀) + 10n log₁₀(d/d₀) + Xσ with
n = 2.8.
The simulation models 20 active stations (STAs) distributed across the
8-unit floor (average 2.5 STAs per unit).
Service Time Variance (σ²) is calculated by observing
the delay between TX_START and ACK_END across
10⁶ simulated TXOPs.
P(TX) = [1 - p_occ]^4, where p_occ is
aggregate occupancy from overlapping neighbors.
P(TX) = 1 - p_local_occ,
restricted to immediate room-level neighbors.
The Goodput Density is derived by filtering raw throughput through the 20ms p95 OWD constraint.
// Derivation for ρ_LL Calculation
for each packet i:
delay_i = contention_delay + serialization_delay + retry_overhead
if delay_i <= 20ms:
accepted_payload += size_i
else:
dropped_from_goodput_metric++
ρ_LL = (accepted_payload) / (total_time * area)
The simulation produces the following goodput derivation for a 1,000 sq ft sections:
120 Mbps * 0.10 = 12 Mbps ρ_LL.
1,280 Mbps * 0.998 = ~128 Mbps ρ_LL.
| Traffic Type | % of Load | Constraint |
|---|---|---|
| Interactive (L4S/Gaming) | 20% | Strict SLO subject |
| Streaming (4K Video) | 50% | Freeze sensitive |
| Bulk (Background) | 30% | Throughput focused |