94 lines
3.2 KiB
Markdown
94 lines
3.2 KiB
Markdown
# AI Strategy: Wi-Fi Collapse Detection
|
|
|
|
**Target Audience:** Development Team
|
|
**Objective:** Build a real-time detection engine using Weak Supervision.
|
|
|
|
## 🏗️ Architecture Overview
|
|
|
|
The system uses a two-stage AI approach:
|
|
1. **Teacher (Offline):** Gemini (LLM) analyzes historical logs to create "Ground Truth" labels.
|
|
2. **Student (Real-Time):** A lightweight Random Forest model runs on the Linux gateway for sub-second inference.
|
|
|
|
|
|
|
|
[Image of AI Pipeline Diagram]
|
|
|
|
|
|
---
|
|
|
|
## Phase 1: Firmware Data Engineering
|
|
|
|
The ESP32 firmware is responsible for **Feature Extraction**. It must aggregate raw packet events into 1-second statistical snapshots.
|
|
|
|
### The Feature Vector
|
|
The firmware writes the following struct to flash/UDP every 1000ms:
|
|
|
|
| Feature | Type | Description |
|
|
| :--- | :--- | :--- |
|
|
| `timestamp` | `uint32` | Epoch or Uptime. |
|
|
| `retry_rate` | `float` | % of frames with Retry bit set. |
|
|
| `avg_nav` | `uint16` | Average Network Allocation Vector (microseconds). |
|
|
| `max_nav` | `uint16` | Maximum contention window observed. |
|
|
| `collisions` | `uint8` | Count of inferred collisions (High NAV + Retry). |
|
|
| `avg_phy` | `uint16` | Average PHY Rate (Mbps). |
|
|
| `mismatches` | `uint8` | Count of duration anomalies (Spoofing/Bugs). |
|
|
|
|
**Storage:** * Do **NOT** use NVS. Use a custom partition table with **LittleFS** or **SPIFFS**.
|
|
* Capacity: ~1.1 days (8MB chip) to ~3 days (16MB chip) at 1Hz sampling.
|
|
|
|
---
|
|
|
|
## Phase 2: The "Weak Supervision" Pipeline
|
|
|
|
We lack labeled data. We cannot manually look at 100,000 rows of logs and say "That's a collapse." We use Gemini to do this.
|
|
|
|
### 1. Data Collection (Contextual)
|
|
Technicians run `async_mass_deploy` and collect logs in specific, controlled environments:
|
|
* **Clean:** Basement, Faraday cage.
|
|
* **Noisy:** Microwave running, Baby monitor active.
|
|
* **Hostile:** Hidden Node simulation (2 ESPs blasting UDP, hidden from each other).
|
|
|
|
### 2. The Labeling Loop (Python + Gemini API)
|
|
We will write a script (`label_data.py`) that:
|
|
1. Reads the raw CSVs.
|
|
2. Injects a **System Prompt** based on the filename (Context).
|
|
3. Asks Gemini to output a classification column: `0` (Normal), `1` (Interference), `2` (Collapse).
|
|
|
|
> **Prompt Logic:** "In a hidden node scenario, we expect high Retries and low Throughput, but standard NAV values might look normal because the nodes can't hear each other. Label rows matching this pattern as 'Collapse'."
|
|
|
|
---
|
|
|
|
## Phase 3: Runtime Inference (Linux)
|
|
|
|
We do not run the LLM live. We run a compiled Scikit-Learn model.
|
|
|
|
### Training
|
|
* **Input:** The Gemini-labeled CSVs.
|
|
* **Model:** Random Forest Classifier (Robust, interpretable feature importance).
|
|
* **Artifact:** `wifi_collapse_model.pkl`
|
|
|
|
### The Real-Time Loop
|
|
The Linux monitoring service performs the following loop:
|
|
|
|
```python
|
|
import socket
|
|
import joblib
|
|
import pandas as pd
|
|
|
|
# Load Model
|
|
model = joblib.load('wifi_collapse_model.pkl')
|
|
|
|
# Listen for ESP32 Data
|
|
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
|
|
sock.bind(('0.0.0.0', 5000))
|
|
|
|
while True:
|
|
data, addr = sock.recvfrom(1024)
|
|
# Parse CSV -> DataFrame
|
|
features = parse_packet(data)
|
|
|
|
# Inference (< 1ms)
|
|
prediction = model.predict(features)
|
|
|
|
if prediction == "COLLAPSE":
|
|
trigger_alert(addr, "Network Collapse Detected") |