Playbook · Master Strategy

How we win the
AI Grand Prix.

The opinionated consolidated strategy. Every other document flows from the claims made here. Written for a team with finite hours, optimizing for a finalist slot at Ohio (November). The win isn't VQ2 laptimes. The win is reliable sim-to-real transfer and a data pipeline the other 1,000 teams don't build.

Goal
Finalist slot in Ohio (Nov)
$500K + Anduril offer
Theme
Reliability × sim-to-real > raw speed
not a lap-time race
Moat
Data pipeline + observation shape
nobody else will build this
Horizon
Apr 2026 → Nov 2026 Final
7 months

§ 01What actually wins

The AI Grand Prix looks like a four-stage race: VQ1 → VQ2 → Physical → Final. It isn't. It's a filter with a prize at the end:

StageFunctionDecidesEffort budget
VQ1Filter · completion pass/failWho moves to VQ210%
VQ2Filter · fastest valid timeWho moves to Physical25%
Physical (Sep, CA)Filter · controlled real-worldWho moves to Final30%
Final (Nov, Ohio)Race · real drones + audiencePrize pool + job offers20%
ReserveUnexpected · rule changes · bugsSurvives the year15%
Unintuitive claim: winning VQ2 is worth less than consistently passing VQ2 with a pipeline that transfers to real hardware. Teams that over-optimize VQ2 laptimes will walk into Physical with a stack they can't re-tune in a week. We don't.

§ 02Reliability math > speed math

Each VQ2 crash costs roughly 15 minutes: the failed 8-minute run + restart + warm-up + anti-cheat rehandshake. In expected-time terms:

E[lap_time] = p(clear) · best_time + p(crash) · 15 min

A 45s pipeline with 5% crash → E = 0.95·45 + 0.05·900 =  87.75s equivalent
A 50s pipeline with 0.5% crash → E = 0.995·50 + 0.005·900 = 54.25s equivalent

                   slower pipeline wins by 33s per attempt
Target: <1% crash rate at ~80% of peak feasible speed. Chase reliability until the curve plateaus, then chase speed.

How to operationalize this:

§ 03The data pipeline is the moat

Unlimited attempts × 8-min runs × 10 fps camera = 4,800 frames per attempt. By the VQ2 deadline, a team that captures and auto-labels every attempt will have 500K+ in-distribution training frames. A team that doesn't will still be training on the initial 2,000 synthetic frames.

The plumbing that unlocks this:

  1. Record everything. Frame + telemetry + detection + our controller output + outcome, every run, every attempt. Zstd-compressed, ~500 MB per attempt.
  2. Auto-label with the current detector. Run Phase-1 YOLO on every captured frame overnight. Keep detections with conf > 0.7 as labels.
  3. Human-correct the 5% edge cases. Hard negatives (false positives), low-confidence hits, frames where the pipeline crashed. 10 min a day, someone on the team.
  4. Retrain weekly. Fine-tune detector on the growing dataset. Track mAP@50 on a held-out validation slice every week.
  5. Feed the improved detector back into Phase 3 PPO training. Observation quality improves with the detector; policy gets stronger in lockstep.
This only works if Step 1 is done on day one. Capturing frames from a VQ1 attempt costs nothing. Going back and re-recording later costs you the attempt budget. Instrument the pilot before the first submission, not after.

§ 04Sim-to-real transfer is the hidden gate

Between VQ2 close (~late July) and Physical Qualifier (early September) is a ~6-week window. Most teams will relax here — bask in qualifier rankings, maybe tune one or two params. This is when the Physical is actually won.

What happens in the September gap

WeekTaskDeliverable
W1–2Capture ~30 min of real-flight telemetry on the DIY 5-inch rigReal-drone dataset · indoor netted space
W2–3Fit residual-dynamics model (Swift method): delta between sim predictions and real outcomes → small MLP correctionsim2real_residual.pt
W3–4Validate PPO policy with residual model inserted. Re-tune PID for real IMU latency.VQ2 policy on DIY drone passes 90%+ gate clears
W4–5Adversarial lighting tests (outdoor, dusk, shadow). Augment detector training set with real hard cases.Detector mAP>95% on real-world eval slice
W5–6Crew procedure drills: 10-min-window drone restart, battery swap, crash recovery, re-upload weights.<90s from crash to re-arm
Budget: 30 minutes of real flight data — not 30 hours. Swift beat human champions with 50 seconds. MonoRace used 5 minutes. The quantity matters less than the residual-dynamics model that consumes it.

Why modular architecture was the right bet for this

Our stack is detect → keypoints → PnP → controller. Every boundary is swappable. At the sim-to-real step we keep the detector + keypoints (they transfer well because game-engine texture is already in-distribution for real-world-trained backbones), we update PnP intrinsics to the real camera, and we insert the residual-dynamics model in front of the controller. That's three files touched, not a full retrain. End-to-end pixel-to-motor policies don't have these seams — they retrain or they lose.

§ 05Using the unlimited-attempt budget

Most teams will treat submissions as hero runs — maybe 5 total, chasing a personal best. That wastes the single biggest experimental asset we have. Our plan:

Experimental attempts

~80%

Identical conditions × 10 repeats per parameter change. Statistical confidence before declaring a win. Log every frame.

stat power10+ repeats

Ranking attempts

~15%

Best-known configuration, clean environment. Submit for scoring. Run sparingly — scoreboard entries invite other teams to study us.

scored runs

Chaos attempts

~5%

Deliberate edge cases: lights off, max-speed cruise, detection-drop induced. Catch failure modes before VQ2 does.

stress tests
Scoreboard discipline. Don't submit your best run until it's been validated at least 20 times offline. Leaderboard reveals your strategy to competitors. The first scored attempt should be conservative; saved the killer configuration for the last submission window.

§ 06Compute envelope (on-drone)

Physical Qualifier uses the Neros Archer platform. Compute spec published closer to the event, but based on class (100 TOPS @ ~15 W) assume Jetson Orin NX-class. Design for it now:

StageBudget (ms)Notes
Detector (YOLO11n INT8 TRT)5Already ~5ms PT. INT8 TRT halves it.
Keypoints (YOLO11n-pose)3Same backbone, pose head
PnP (SOLVEPNP_IPPE_SQUARE)0.5CPU, 4 points, closed-form
Target-gate tracker0.1Python dict lookup
Controller (PID or PPO ONNX)0.2Tiny MLP, CPU
Transport (sim/MAVLink)1UDP local or serial
Overhead (logging, telemetry)0.2Async background
Total10 ms100 FPS control loop

Target 10 ms gives us 2× headroom before the 20 ms per-frame budget (matches 50 FPS camera). Anything more is wasted on this hardware class.

§ 07Explicit anti-patterns (don't build these)

End-to-end pixel→motor

WON'T TRANSFER

SkyDreamer is beautiful research. It does not transfer sim-to-real with 30 min of flight data. We'd need thousands of real flight hours, which we won't have. Stay modular.

Course-specific memorization

GATES CHANGE

Gates change between VQ1 and VQ2. Start / finish gates may differ. No persistent global map. Anything that over-fits the VQ1 layout breaks in VQ2 or at Physical.

Global vision SLAM

RETIRED 2026-04-17

LingBot-Map / VGGT / DUSt3R can't track features at racing velocities (38°/frame rotation). We proved this at 17m ATE. Archived, not revisited.

Imitation learning from humans

RULE VIOLATION

"No human interaction during runs." Reviewer-visible code that calls human-recorded trajectories is likely flagged. Pure RL or deterministic only.

Hero-run attempt strategy

WASTED BUDGET

Saving attempts for "the perfect run" throws away the experimental signal. Use unlimited attempts the way they're meant: statistical testing, continuous improvement.

Privileged-obs PPO → submission

LANDMINE

PPO trained against NED gate positions won't run on the real sim (no absolute positioning). Train with --observation-mode=detector_telemetry. See apex-pipeline.

§ 08Calendar (from now → Ohio)

NOW · Apr 21
Detector · pilot ready
VQ1 OPEN
May · submit completion
VQ2 OPEN
Jun · train on real sim frames
VQ2 CUTOFF
~late Jul · final submissions
PHYSICAL
Sep · CA · 2 wk on-site
FINAL
Nov · Ohio · $500K
WindowPrimary focusDeliverable
NOW → VQ1 openVQ1 completion pilot ready · telemetry adapter stubbed · sim package installed · data capture harnessPass VQ1 on first sim-day run
VQ1 → VQ2 openFine-tune detector on VQ1-sim frames · observation-swap PPO training starts · SubprocVecEnv fan-outPPO baseline beats PID on VQ1 course
VQ2 open → cutoffPPO policy tuning · adversarial detector · chaos testing · final submissionTop-30 seed into Physical
Cutoff → PhysicalSim-to-real. Real flight data capture · residual dynamics · DIY rig validation · crew drillsDIY drone clears gates autonomously
Physical QualifierOn-site tuning · hardware-specific PID · crash recovery · 10-min window disciplineTop-10 into Final
Physical → FinalAdversarial lighting · audience-noise perception stress · backup policy versions · restart proceduresWin the Final

§ 09Team discipline — what we commit to

§ 10Scorecard (how we know we're winning)

SignalGreenAmberRed
Detector mAP@50 on VQ1-sim frames>98%95–98%<95%
VQ1 completion rate (attempts)100%95–99%<95%
VQ2 crash rate<1%1–5%>5%
End-to-end latency on Orin NX<15 ms15–30 ms>30 ms
Captured-frame dataset size by VQ2>200K50K–200K<50K
Real-flight data captured by Aug>20 min5–20 min<5 min
DIY rig autonomous gate-clear rate>90%70–90%<70%
Green across the board by mid-August = we are positioned to win the Physical Qualifier and show up to Ohio as one of the final 10. Amber on more than two = reprioritize. Red on any = stop current work and triage.
WINNING-PLAYBOOK · v1.0 2026-04-21 · ← Index · Strategy · APEX