The opinionated consolidated strategy. Every other document flows from the claims made here. Written for a team with finite hours, optimizing for a finalist slot at Ohio (November). The win isn't VQ2 laptimes. The win is reliable sim-to-real transfer and a data pipeline the other 1,000 teams don't build.
The AI Grand Prix looks like a four-stage race: VQ1 → VQ2 → Physical → Final. It isn't. It's a filter with a prize at the end:
| Stage | Function | Decides | Effort budget |
|---|---|---|---|
| VQ1 | Filter · completion pass/fail | Who moves to VQ2 | 10% |
| VQ2 | Filter · fastest valid time | Who moves to Physical | 25% |
| Physical (Sep, CA) | Filter · controlled real-world | Who moves to Final | 30% |
| Final (Nov, Ohio) | Race · real drones + audience | Prize pool + job offers | 20% |
| Reserve | Unexpected · rule changes · bugs | Survives the year | 15% |
Each VQ2 crash costs roughly 15 minutes: the failed 8-minute run + restart + warm-up + anti-cheat rehandshake. In expected-time terms:
E[lap_time] = p(clear) · best_time + p(crash) · 15 min
A 45s pipeline with 5% crash → E = 0.95·45 + 0.05·900 = 87.75s equivalent
A 50s pipeline with 0.5% crash → E = 0.995·50 + 0.005·900 = 54.25s equivalent
slower pipeline wins by 33s per attempt
How to operationalize this:
r_crash = -500, not -50. Make the policy genuinely crash-averse.Unlimited attempts × 8-min runs × 10 fps camera = 4,800 frames per attempt. By the VQ2 deadline, a team that captures and auto-labels every attempt will have 500K+ in-distribution training frames. A team that doesn't will still be training on the initial 2,000 synthetic frames.
The plumbing that unlocks this:
Between VQ2 close (~late July) and Physical Qualifier (early September) is a ~6-week window. Most teams will relax here — bask in qualifier rankings, maybe tune one or two params. This is when the Physical is actually won.
| Week | Task | Deliverable |
|---|---|---|
| W1–2 | Capture ~30 min of real-flight telemetry on the DIY 5-inch rig | Real-drone dataset · indoor netted space |
| W2–3 | Fit residual-dynamics model (Swift method): delta between sim predictions and real outcomes → small MLP correction | sim2real_residual.pt |
| W3–4 | Validate PPO policy with residual model inserted. Re-tune PID for real IMU latency. | VQ2 policy on DIY drone passes 90%+ gate clears |
| W4–5 | Adversarial lighting tests (outdoor, dusk, shadow). Augment detector training set with real hard cases. | Detector mAP>95% on real-world eval slice |
| W5–6 | Crew procedure drills: 10-min-window drone restart, battery swap, crash recovery, re-upload weights. | <90s from crash to re-arm |
Our stack is detect → keypoints → PnP → controller. Every boundary is swappable. At the sim-to-real step we keep the detector + keypoints (they transfer well because game-engine texture is already in-distribution for real-world-trained backbones), we update PnP intrinsics to the real camera, and we insert the residual-dynamics model in front of the controller. That's three files touched, not a full retrain. End-to-end pixel-to-motor policies don't have these seams — they retrain or they lose.
Most teams will treat submissions as hero runs — maybe 5 total, chasing a personal best. That wastes the single biggest experimental asset we have. Our plan:
Identical conditions × 10 repeats per parameter change. Statistical confidence before declaring a win. Log every frame.
Best-known configuration, clean environment. Submit for scoring. Run sparingly — scoreboard entries invite other teams to study us.
Deliberate edge cases: lights off, max-speed cruise, detection-drop induced. Catch failure modes before VQ2 does.
Physical Qualifier uses the Neros Archer platform. Compute spec published closer to the event, but based on class (100 TOPS @ ~15 W) assume Jetson Orin NX-class. Design for it now:
| Stage | Budget (ms) | Notes |
|---|---|---|
| Detector (YOLO11n INT8 TRT) | 5 | Already ~5ms PT. INT8 TRT halves it. |
| Keypoints (YOLO11n-pose) | 3 | Same backbone, pose head |
| PnP (SOLVEPNP_IPPE_SQUARE) | 0.5 | CPU, 4 points, closed-form |
| Target-gate tracker | 0.1 | Python dict lookup |
| Controller (PID or PPO ONNX) | 0.2 | Tiny MLP, CPU |
| Transport (sim/MAVLink) | 1 | UDP local or serial |
| Overhead (logging, telemetry) | 0.2 | Async background |
| Total | 10 ms | 100 FPS control loop |
Target 10 ms gives us 2× headroom before the 20 ms per-frame budget (matches 50 FPS camera). Anything more is wasted on this hardware class.
SkyDreamer is beautiful research. It does not transfer sim-to-real with 30 min of flight data. We'd need thousands of real flight hours, which we won't have. Stay modular.
Gates change between VQ1 and VQ2. Start / finish gates may differ. No persistent global map. Anything that over-fits the VQ1 layout breaks in VQ2 or at Physical.
LingBot-Map / VGGT / DUSt3R can't track features at racing velocities (38°/frame rotation). We proved this at 17m ATE. Archived, not revisited.
"No human interaction during runs." Reviewer-visible code that calls human-recorded trajectories is likely flagged. Pure RL or deterministic only.
Saving attempts for "the perfect run" throws away the experimental signal. Use unlimited attempts the way they're meant: statistical testing, continuous improvement.
PPO trained against NED gate positions won't run on the real sim (no absolute positioning). Train with --observation-mode=detector_telemetry. See apex-pipeline.
| Window | Primary focus | Deliverable |
|---|---|---|
| NOW → VQ1 open | VQ1 completion pilot ready · telemetry adapter stubbed · sim package installed · data capture harness | Pass VQ1 on first sim-day run |
| VQ1 → VQ2 open | Fine-tune detector on VQ1-sim frames · observation-swap PPO training starts · SubprocVecEnv fan-out | PPO baseline beats PID on VQ1 course |
| VQ2 open → cutoff | PPO policy tuning · adversarial detector · chaos testing · final submission | Top-30 seed into Physical |
| Cutoff → Physical | Sim-to-real. Real flight data capture · residual dynamics · DIY rig validation · crew drills | DIY drone clears gates autonomously |
| Physical Qualifier | On-site tuning · hardware-specific PID · crash recovery · 10-min window discipline | Top-10 into Final |
| Physical → Final | Adversarial lighting · audience-noise perception stress · backup policy versions · restart procedures | Win the Final |
train_apex.py phase, captured config, logged results). No "I just tweaked it in a notebook."reproduce.sh, structure the repo so a reviewer can trace any output back to deterministic inputs.| Signal | Green | Amber | Red |
|---|---|---|---|
| Detector mAP@50 on VQ1-sim frames | >98% | 95–98% | <95% |
| VQ1 completion rate (attempts) | 100% | 95–99% | <95% |
| VQ2 crash rate | <1% | 1–5% | >5% |
| End-to-end latency on Orin NX | <15 ms | 15–30 ms | >30 ms |
| Captured-frame dataset size by VQ2 | >200K | 50K–200K | <50K |
| Real-flight data captured by Aug | >20 min | 5–20 min | <5 min |
| DIY rig autonomous gate-clear rate | >90% | 70–90% | <70% |