- Add flash_all_parallel.py using multiprocessing
- Support two strategies: build-and-flash, build-then-flash
- Configurable parallelism for builds and flashing
- Reduces 32-device deployment from 60-90 min to 15-20 min
- Add comprehensive PARALLEL_FLASH.md documentation