Technical Brief for Development Team
This document outlines the machine learning pipeline for detecting Wi-Fi "Collapse" events (Hidden Node, Saturation, Interference) using ESP32 sensors. We utilize an LLM-Assisted Weak Supervision approach to overcome the lack of labeled training data.
We cannot hard-code thresholds because RF environments vary wildly. Instead, we use Gemini (LLM) as a "Senior Network Engineer" to label raw logs, and then train a fast, lightweight model (Random Forest/XGBoost) to mimic that decision logic in real-time.
The ESP32 firmware is the data generator. It must output time-series features suitable for ML, not just human-readable logs.
Every 1 second, the firmware writes a CSV line to the internal storage partition. This is our feature set:
Timestamp: Epoch time.RetryRate: (Float 0-100) Percentage of frames requiring retransmission.AvgNAV: (UInt16) Average Network Allocation Vector duration (microseconds).MaxNAV: (UInt16) Peak contention window seen in that second.Collisions: (UInt8) Count of inferred collision events.AvgPHY: (UInt16) Average data rate (Mbps). Low PHY + High NAV = Bad.Mismatches: (UInt8) Count of packets where duration > expected airtime.We use a custom partition table to allocate ~5-13MB for LittleFS/SPIFFS. NVS is strictly for config. This allows 24-72 hours of continuous logging.
We solve the "Cold Start Problem" (having data but no labels) by using Generative AI.
Technicians capture logs in known scenarios. The filename conveys the context:
microwave_interference.csvhidden_node_scenario_A.csvclean_baseline.csvWe feed the raw CSV chunks to Gemini via API with a prompt that injects domain knowledge:
Result: A "Silver Standard" labeled dataset ready for supervised learning.
Gemini is too slow for real-time packet analysis. We train a classical model for the actual work.
Algorithm: Random Forest Classifier or XGBoost.
Deployment Target: Linux Gateway / Edge Server.
model.pkl.model.predict().Collapse_Prob > 0.8 for 3 consecutive seconds -> TRIGGER ALERT.