Strategy · Updated 2026-04-19

Winning strategy for the
AI Grand Prix.

Anduril × DCL · $500,000 prize pool · two virtual qualifiers, one physical qualifier, one final. The honest plan: a deterministic VQ1 completion stack first to lock in a submission, then APEX perception-aware PPO for VQ2 speed. Rebuilt after the 2026-04-19 spec confirmed no GPS, no absolute positioning, FPV + telemetry in, Throttle / Roll / Pitch / Yaw out.

Series
AI Grand Prix 2026
Anduril × DCL
VQ1
Completion · <10 gates
May 2026
VQ2
Fastest time · <20 gates
Jun–Jul 2026
Physical
Controlled env, no audience
Sep 2026 · California
Final
With audience + distractions
Nov 2026 · Ohio
Read the Winning Playbook first. It consolidates the master strategy: effort budget across VQ1/VQ2/Physical/Final, reliability-math vs speed-math, the data pipeline moat, sim-to-real bridge, attempt strategy, compute envelope, and explicit anti-patterns. This doc is the tactical expansion of the playbook.

§ 01The insight everyone will miss

VQ1 is a completion test, not a race. Teams who over-engineer end-to-end RL policies for VQ1 are burning weeks on a problem that doesn't exist yet. Ship a deterministic YOLO + PnP + PID stack for VQ1. Lock the submission. Train perception-aware PPO on the real VQ1 sim for VQ2 — where speed actually counts.
The virtual qualifiers are filters, not the prize. $500K + Anduril job offers are decided at the November Final in Ohio. VQ1 and VQ2 cut 1,000+ teams to ~30 for Physical in California. Winning VQ2 by a second means nothing if your pipeline can't retrain for real hardware in the 6-week September gap. Optimize for finalist slot, not leaderboard photos.

What most teams build for VQ1

ANTI-PATTERN

End-to-end PPO trained against privileged NED gate positions. Fails at submission because the real sim gives no absolute positioning.

no NED anywhere

What we build for VQ1

SHIP FIRST

YOLO11n detector → 4-corner keypoints → PnP → PID with telemetry. Zero learning on the critical path. Ships before VQ1 opens.

vq1_completion_pilot.py

§ 02VQ1 vs VQ2 — different problems, different stacks

VQ1 Completion Stack — minimum viable, ships first.
VQ2 APEX PPO — earns its place here.
Physical (Sep) & Final (Nov) — same architecture + sim-to-real bridge. CA controlled env first, then Ohio with audience and environmental distractions. Swift-style residual dynamics from a small amount of real-flight data (MonoRace used <5 min). Camera details for the physical stage released later.

§ 03Pipeline architecture

Same input surface and same output surface for both tracks. The VQ1 and VQ2 stacks diverge only at the controller stage.

FPV + Telemetry
sim input
YOLO11n
detector
Keypoints
4 corners
PnP
gate pose
PID / PPO
VQ1 · VQ2
T / R / P / Y
sim output

§ 04Training pipeline

python train_apex.py — one script, three phases.

PhaseModelDeviceDatasetTimeUsed by
1: Detector YOLO11n · 2.6M params GPU dataset_gates_mega ~2 hr VQ1 + VQ2
2: Keypoints YOLO11n-pose · 4 corners GPU dataset_gates_mega_pose ~1.5 hr VQ1 + VQ2
3: APEX Policy UPDATED Perception-aware PPO · MLP [256,256,256] CPU + parallel envs SimDrone → VQ1 sim fine-tune ~4 hr VQ2 only
Phase 3 observation-swap landmine. The current Phase-3 PPO observation reads privileged gate NED from course_map_test.json. That will not transfer to the real AIGP sim — no absolute positioning. Before any VQ2 submission, the observation must be swapped to detector output (bbox + keypoints + conf) + telemetry (attitude + body rates + accel) + last action. The --privileged-obs flag keeps the old path for dev runs.
python train_apex.py                                      # all three phases
python train_apex.py detector --epochs 5                  # smoke
python train_apex.py policy --observation-mode=detector_telemetry

§ 05Detector choice

MetricYOLO11nRF-DETR-NanoRF-DETR-Small
mAP50:9539.548.453.0
Latency (T4 TRT)5ms+2.32ms3.52ms
Aerial dataset gapbaseline+5 mAP+8 mAP
BackboneCNNDINOv2DINOv2
Jetson Orin speedbaseline+20–30%comparable
LicenseAGPL-3.0Apache 2.0Apache 2.0

We ship YOLO11n first (already trained) for VQ1. RF-DETR-Nano retrain is a P2 upgrade if VQ1-sim frames reveal YOLO recall is the bottleneck. Both feed the same keypoints + PnP + controller stack downstream.

§ 06Novel training data — what no one else will have

Sim-day dataset capture

P0 · DAY 1

Once VQ1 sim credentials land: slow manual laps capturing (frame, telemetry, target-gate) triples. ~10 min on the actual VQ1 sim gives ~6000 in-distribution training frames. Fine-tune YOLO11n + YOLO11n-pose on these.

closes domain gap

Telemetry-conditioned PnP

P1

Project PnP results into a stable body frame using telemetry attitude; average over the last K frames. Reduces PnP jitter during hard banking. Zero labeling cost.

no labels

Adversarial robustness

P1 · VQ2

VQ2 adds lighting changes, 3D objects, obstacles. Hard-negative mining: brightness shifts, lens flare, motion blur, partial occlusion, gate-like false positives. dataset_gates_hardneg is the starter.

hard negatives

Parallel-instance RL fan-out

P1

AIGP sim explicitly supports multiple parallel instances on one box. Wire SubprocVecEnv for 8+ envs on the RTX 5080 host. Training wall-clock drops 6–8×.

8× envs

Reviewable submission package

P1

Rules require code access for review. Pin dep versions, one-command reproduction, traceable outputs. Reviewer can re-run any artifact from deterministic inputs.

reproducible

Telemetry-anchored gate-order tracker

P2

No absolute positioning means no course map. Telemetry-integrated heading + last-seen gate bearing gives a short-horizon "gate after next" predictor for occluded frames. Cheap, no one else will build it.

short-horizon

§ 07Retired (NED-dependent)

The 2026-04-19 update confirmed there is no absolute positioning. These were built on the NED assumption and are retired — kept in-repo for reference, not on the critical path.

course_mapper.py

RETIRED

Kalman filter fusing ODOMETRY + visual detections into a global NED gate map. No GPS / no absolute positioning — the input signal doesn't exist. Replaced by frame-local PnP + target-gate tracker.

mpc_tracker.py

RETIRED

Track a precomputed NED trajectory at 120Hz using ODOMETRY state. Can't precompute an NED trajectory without NED. Replaced by PID (VQ1) + PPO (VQ2), both frame-local.

§ 08VQ1 sim day-one checklist

PriorityTaskWhyTime
P0Wire telemetry adapterMap sim payload → our observation format~1 hr
P0Run VQ1 completion stackYOLO + keypoints + PnP + PID. Target 100% gate-clear.~2 hr tune
P0Capture sim-frame dataset10 min telemetry-labeled frames for detector fine-tune~15 min + 2 hr retrain
P0Submit VQ1 completion entryLock in a passing submission before iterating~30 min
P1SubprocVecEnv wrapperStand up 8 instances for PPO training~3 hr
P1Swap PPO observationRequired for VQ2 submission to transfer~4 hr
P2RF-DETR retrain on sim framesOnly if YOLO recall is the VQ2 bottleneck~2 hr

§ 08BReliability math > speed math

Each VQ2 crash costs ~15 minutes of wall clock: the failed 8-minute run + restart + warm-up + anti-cheat rehandshake. In expected-time terms:

E[lap_time] = p(clear) · best_time + p(crash) · 15 min

45s pipeline, 5% crash:    0.95·45 + 0.05·900 = 87.75s equivalent
50s pipeline, 0.5% crash:  0.995·50 + 0.005·900 = 54.25s equivalent

The slower, more reliable pipeline wins by 33 seconds per attempt.
Target: <1% crash rate at ~80% of peak feasible speed. Chase reliability until the curve plateaus, then chase speed.

§ 08CData pipeline is the moat

Unlimited attempts × 8-min runs × 10 fps = 4,800 frames per attempt. By VQ2 deadline, a team that auto-labels every attempt will have 500K+ in-distribution training frames. A team that doesn't is still training on the initial 2,000 synthetic frames.

  1. Record everything: frame + telemetry + detection + controller output + outcome. Zstd-compressed, ~500 MB per attempt.
  2. Auto-label with the current detector overnight. Confidence > 0.7 = label.
  3. Human-correct 5% edge cases. 10 min a day, someone on the team.
  4. Retrain weekly. mAP@50 tracked on held-out validation slice.
  5. Improved detector → improved Phase 3 PPO observations → stronger policy.
Only works if instrumentation is in place before the first VQ1 attempt. Captured attempts cost nothing; re-recording costs the attempt budget. Instrument on Day 1.

§ 08DSim-to-real is the hidden gate

Between VQ2 close (~late July) and Physical (early September) is a ~6-week window. Most teams will relax. That's when Physical is actually won.

WeekTaskDeliverable
W1–2Capture ~30 min of real-flight telemetry on the DIY 5-inch rigReal-drone dataset
W2–3Fit residual-dynamics model (Swift method)sim2real_residual.pt
W3–4Validate PPO policy with residual. Re-tune PID for real IMU latencyDIY drone clears 90%+ gates
W4–5Adversarial lighting (outdoor, dusk, shadow)Detector mAP>95% on real-world slice
W5–6Crew procedure drills: 10-min-window restart, battery swap, crash recovery<90s crash-to-rearm
Swift used 50 seconds of real-flight data. MonoRace used 5 minutes. We budget ~30 min — the residual-dynamics model that consumes it matters more than the quantity.

§ 08EEffort budget across the year

The filter-then-prize structure means effort allocation looks nothing like the stage cadence suggests:

StageFunctionEffort %Rationale
VQ1Completion filter10%Deterministic pilot is enough; don't over-invest
VQ2Speed filter25%Need good PPO, but ship "good enough" not "world-record"
Physical (Sep)Real-drone filter30%Sim-to-real + hardware + crew · where finalists are made
Final (Nov)Prize race20%Adversarial robustness, audience noise, restart procedures
ReserveUnexpected · rule changes · bugs15%Year-long projects always need it

§ 09The key bet

Most teams will treat VQ1 as a dry run for VQ2 and show up with a half-trained PPO policy that took weeks and doesn't reliably complete anything.

We show up to VQ1 with a deterministic, zero-learning completion stack that clears the course on day one. That locks in a passing submission.

Then we spend the month between VQ1 and VQ2 training perception-aware PPO on actual VQ1-sim frames — not synthetic proxies — with the right observation surface (detector output + telemetry, no privileged data). The team that ships a boring solution for VQ1 and uses the window to train a good one for VQ2 wins.

§ 10Competition intel

Confirmed 2026-04-19

KNOWN
  • Windows-only sim (Linux not working)
  • Active internet during runs (anti-cheat)
  • Multiple parallel sim instances supported
  • Inputs: FPV visual stream + telemetry
  • No GPS, no absolute positioning, no depth, no sensor shortcuts
  • Outputs: Throttle, Roll, Pitch, Yaw
  • VQ1: <10 gates, clear visibility, completion-scored
  • VQ2: <20 gates, lighting changes + 3D objects + obstacles, fastest valid time
  • Same course per round; gates change between VQ1 and VQ2
  • Full 3D environment with elevation changes
  • 8-min max per run; unlimited attempts
  • No human interaction during runs; code must be reviewable

Still TBD

TBD
  • Exact Python API / telemetry payload format
  • Camera resolution, framerate, FOV for virtual stage
  • Camera specs for physical stage
  • Exact scoring formula beyond "completion / fastest time"
  • Distractor-object class inventory
  • Compute limit on the drone for physical qualifier

§ 11Key references

SystemAchievementTechniqueOur use
MonoRaceA2RL 2025 champion, beat 3 human world champsU-Net GateSeg + PnP + EKF + PPO G&CNet 500HzDetector chain + PnP + observation layout
SwiftNature 2023, first AI to beat humans at FPVPerception-aware PPO rewardPhase 3 reward + sim-to-real residual
RF-DETRFirst real-time model >60 mAP (ICLR 2026)DINOv2 backbone + deformable decoderP2 upgrade from YOLO11n
SkyDreamerICLR 2025, end-to-end fastest inference (4.3ms)DreamerV3 pixels→motors, VQ-VAE world modelReference only (modular beats e2e)
WINNING-STRATEGY · v2.0 2026-04-19 · ← Index