Apples-to-apples comparison of APEX Phase 1 detectors (YOLO11n, RF-DETR, U-Net, Color) on the same SimDrone course. Outputs gate-pass count, lap time, per-frame latency, detection rate, and false-positive rate. All results recorded to benchmark_results.json for trend tracking.
benchmark_models.pybenchmark_results.jsonpython benchmark_models.py # full (2 laps)
python benchmark_models.py --quick # 1 lap
python benchmark_models.py --modes color yolo unet # specific detectors
python benchmark_models.py --export html # also write model-eval-dashboard.html
| Metric | Meaning | Good value |
|---|---|---|
| Gates passed | Gates cleared in order during the run | All |
| Total time | Wall-clock seconds to finish | <60 s competitive |
| Avg gate time | Time between consecutive gate passages | <3 s |
| Detection rate | % of frames with ≥1 gate detected | >80% |
| Vision latency | Per-frame inference ms (median) | <10 ms VQ1 · <5 ms VQ2 |
| Vision FPS | 1 / vision_latency | >100 Hz |
| False-positive rate | Detections on non-gate frames | <1% |
Mode Gates Time Avg Gate Vision Det % FPS FP %
─────────────────────────────────────────────────────────────────
color 22 45.2s 2.05s 0.3ms 92% 3333 2.1
unet 22 42.8s 1.95s 4.8ms 96% 208 0.4
yolo11n 22 44.1s 2.01s 5.2ms 99% 192 0.2
rfdetr-nano 22 41.6s 1.89s 2.3ms 100% 435 0.1
RECOMMENDED: rfdetr-nano
Set vision.mode: "rfdetr_nano" in race_config.py
benchmark_results.json so we can track detector regressions over time.
| Situation | What to run |
|---|---|
| After training a new detector | benchmark_models.py --modes new_detector vs current champion |
| Before every submission | Full benchmark, capture benchmark_results.json |
| Dataset growth milestone (every +50K frames) | Full benchmark, check for regression |
| After a code change outside vision | Quick benchmark, confirm no stealth breakage |
Benchmark runs on SimDrone today. Once the AIGP sim drops (May 2026), the same harness points at VQ1-sim proxy frames. Every captured run becomes a benchmark point:
python benchmark_models.py \
--frames recordings/vq1_captured/frames \
--telemetry recordings/vq1_captured/telemetry.jsonl
This closes the loop: detector improves → benchmark confirms → capture more frames → detector improves. See playbook §03.