Strategy · Updated 2026-04-19

Winning strategy for the
AI Grand Prix.

Anduril × DCL · $500,000 prize pool · two virtual qualifiers, one physical qualifier, one final. The honest plan: a deterministic VQ1 completion stack first to lock in a submission, then APEX perception-aware PPO for VQ2 speed. Rebuilt after the 2026-04-19 spec confirmed no GPS, no absolute positioning, FPV + telemetry in, Throttle / Roll / Pitch / Yaw out.

Series

AI Grand Prix 2026

Anduril × DCL

VQ1

Completion · <10 gates

May 2026

VQ2

Fastest time · <20 gates

Jun–Jul 2026

Physical

Controlled env, no audience

Sep 2026 · California

Final

With audience + distractions

Nov 2026 · Ohio

Read the Winning Playbook first. It consolidates the master strategy: effort budget across VQ1/VQ2/Physical/Final, reliability-math vs speed-math, the data pipeline moat, sim-to-real bridge, attempt strategy, compute envelope, and explicit anti-patterns. This doc is the tactical expansion of the playbook.

§ 01The insight everyone will miss

VQ1 is a completion test, not a race. Teams who over-engineer end-to-end RL policies for VQ1 are burning weeks on a problem that doesn't exist yet. Ship a deterministic YOLO + PnP + PID stack for VQ1. Lock the submission. Train perception-aware PPO on the real VQ1 sim for VQ2 — where speed actually counts.

The virtual qualifiers are filters, not the prize. $500K + Anduril job offers are decided at the November Final in Ohio. VQ1 and VQ2 cut 1,000+ teams to ~30 for Physical in California. Winning VQ2 by a second means nothing if your pipeline can't retrain for real hardware in the 6-week September gap. Optimize for finalist slot, not leaderboard photos.

What most teams build for VQ1

ANTI-PATTERN

End-to-end PPO trained against privileged NED gate positions. Fails at submission because the real sim gives no absolute positioning.

no NED anywhere

What we build for VQ1

SHIP FIRST

YOLO11n detector → 4-corner keypoints → PnP → PID with telemetry. Zero learning on the critical path. Ships before VQ1 opens.

vq1_completion_pilot.py

§ 02VQ1 vs VQ2 — different problems, different stacks

VQ1 Completion Stack — minimum viable, ships first.

Perception: YOLO11n gate detector (APEX Phase 1) + YOLO11n-pose 4-corner keypoints (Phase 2). Already trained.
Geometry: PnP on 4 corners → gate pose in camera frame → body frame using telemetry attitude.
Target selection: closest detected gate in front of camera; advance on gate-passed detection. Cannot use a course map — no absolute positioning.
Controller: PID on (heading-to-gate, altitude-to-gate) + feed-forward throttle. Conservative speed.
Fallback: lost detection → hold attitude + slow forward until recovery.

VQ2 APEX PPO — earns its place here.

Same detection frontend as VQ1 (YOLO + keypoints + PnP).
PPO policy takes detector output (bbox + keypoints + confidence) + telemetry (attitude, body rates, accel) + last action + short detection history.
Perception-aware reward — Swift's cos(camera_boresight, next_gate); teaches seek-attack behavior.
Trains on SimDrone proxy first, then fine-tunes on real VQ1-sim frames captured once credentials drop in May.

Physical (Sep) & Final (Nov) — same architecture + sim-to-real bridge. CA controlled env first, then Ohio with audience and environmental distractions. Swift-style residual dynamics from a small amount of real-flight data (MonoRace used <5 min). Camera details for the physical stage released later.

§ 03Pipeline architecture

Same input surface and same output surface for both tracks. The VQ1 and VQ2 stacks diverge only at the controller stage.

FPV + Telemetry

sim input

YOLO11n

detector

Keypoints

4 corners

PnP

gate pose

PID / PPO

VQ1 · VQ2

T / R / P / Y

sim output

§ 04Training pipeline

python train_apex.py — one script, three phases.

Phase	Model	Device	Dataset	Time	Used by
1: Detector	YOLO11n · 2.6M params	GPU	dataset_gates_mega	~2 hr	VQ1 + VQ2
2: Keypoints	YOLO11n-pose · 4 corners	GPU	dataset_gates_mega_pose	~1.5 hr	VQ1 + VQ2
3: APEX Policy UPDATED	Perception-aware PPO · MLP [256,256,256]	CPU + parallel envs	SimDrone → VQ1 sim fine-tune	~4 hr	VQ2 only

Phase 3 observation-swap landmine. The current Phase-3 PPO observation reads privileged gate NED from course_map_test.json. That will not transfer to the real AIGP sim — no absolute positioning. Before any VQ2 submission, the observation must be swapped to detector output (bbox + keypoints + conf) + telemetry (attitude + body rates + accel) + last action. The --privileged-obs flag keeps the old path for dev runs.

python train_apex.py                                      # all three phases
python train_apex.py detector --epochs 5                  # smoke
python train_apex.py policy --observation-mode=detector_telemetry

§ 05Detector choice

Metric	YOLO11n	RF-DETR-Nano	RF-DETR-Small
mAP50:95	39.5	48.4	53.0
Latency (T4 TRT)	5ms+	2.32ms	3.52ms
Aerial dataset gap	baseline	+5 mAP	+8 mAP
Backbone	CNN	DINOv2	DINOv2
Jetson Orin speed	baseline	+20–30%	comparable
License	AGPL-3.0	Apache 2.0	Apache 2.0

We ship YOLO11n first (already trained) for VQ1. RF-DETR-Nano retrain is a P2 upgrade if VQ1-sim frames reveal YOLO recall is the bottleneck. Both feed the same keypoints + PnP + controller stack downstream.

§ 06Novel training data — what no one else will have

Sim-day dataset capture

P0 · DAY 1

Once VQ1 sim credentials land: slow manual laps capturing (frame, telemetry, target-gate) triples. ~10 min on the actual VQ1 sim gives ~6000 in-distribution training frames. Fine-tune YOLO11n + YOLO11n-pose on these.

closes domain gap

Telemetry-conditioned PnP

Project PnP results into a stable body frame using telemetry attitude; average over the last K frames. Reduces PnP jitter during hard banking. Zero labeling cost.

no labels

Adversarial robustness

P1 · VQ2

VQ2 adds lighting changes, 3D objects, obstacles. Hard-negative mining: brightness shifts, lens flare, motion blur, partial occlusion, gate-like false positives. dataset_gates_hardneg is the starter.

hard negatives

Parallel-instance RL fan-out

AIGP sim explicitly supports multiple parallel instances on one box. Wire SubprocVecEnv for 8+ envs on the RTX 5080 host. Training wall-clock drops 6–8×.

8× envs

Reviewable submission package

Rules require code access for review. Pin dep versions, one-command reproduction, traceable outputs. Reviewer can re-run any artifact from deterministic inputs.

reproducible

Telemetry-anchored gate-order tracker

No absolute positioning means no course map. Telemetry-integrated heading + last-seen gate bearing gives a short-horizon "gate after next" predictor for occluded frames. Cheap, no one else will build it.

short-horizon

§ 07Retired (NED-dependent)

The 2026-04-19 update confirmed there is no absolute positioning. These were built on the NED assumption and are retired — kept in-repo for reference, not on the critical path.

course_mapper.py

RETIRED

Kalman filter fusing ODOMETRY + visual detections into a global NED gate map. No GPS / no absolute positioning — the input signal doesn't exist. Replaced by frame-local PnP + target-gate tracker.

mpc_tracker.py

RETIRED

Track a precomputed NED trajectory at 120Hz using ODOMETRY state. Can't precompute an NED trajectory without NED. Replaced by PID (VQ1) + PPO (VQ2), both frame-local.

§ 08VQ1 sim day-one checklist

Priority	Task	Why	Time
P0	Wire telemetry adapter	Map sim payload → our observation format	~1 hr
P0	Run VQ1 completion stack	YOLO + keypoints + PnP + PID. Target 100% gate-clear.	~2 hr tune
P0	Capture sim-frame dataset	10 min telemetry-labeled frames for detector fine-tune	~15 min + 2 hr retrain
P0	Submit VQ1 completion entry	Lock in a passing submission before iterating	~30 min
P1	SubprocVecEnv wrapper	Stand up 8 instances for PPO training	~3 hr
P1	Swap PPO observation	Required for VQ2 submission to transfer	~4 hr
P2	RF-DETR retrain on sim frames	Only if YOLO recall is the VQ2 bottleneck	~2 hr

§ 08BReliability math > speed math

Each VQ2 crash costs ~15 minutes of wall clock: the failed 8-minute run + restart + warm-up + anti-cheat rehandshake. In expected-time terms:

E[lap_time] = p(clear) · best_time + p(crash) · 15 min

45s pipeline, 5% crash:    0.95·45 + 0.05·900 = 87.75s equivalent
50s pipeline, 0.5% crash:  0.995·50 + 0.005·900 = 54.25s equivalent

The slower, more reliable pipeline wins by 33 seconds per attempt.

Target: <1% crash rate at ~80% of peak feasible speed. Chase reliability until the curve plateaus, then chase speed.

§ 08CData pipeline is the moat

Unlimited attempts × 8-min runs × 10 fps = 4,800 frames per attempt. By VQ2 deadline, a team that auto-labels every attempt will have 500K+ in-distribution training frames. A team that doesn't is still training on the initial 2,000 synthetic frames.

Record everything: frame + telemetry + detection + controller output + outcome. Zstd-compressed, ~500 MB per attempt.
Auto-label with the current detector overnight. Confidence > 0.7 = label.
Human-correct 5% edge cases. 10 min a day, someone on the team.
Retrain weekly. mAP@50 tracked on held-out validation slice.
Improved detector → improved Phase 3 PPO observations → stronger policy.

Only works if instrumentation is in place before the first VQ1 attempt. Captured attempts cost nothing; re-recording costs the attempt budget. Instrument on Day 1.

§ 08DSim-to-real is the hidden gate

Between VQ2 close (~late July) and Physical (early September) is a ~6-week window. Most teams will relax. That's when Physical is actually won.

Week	Task	Deliverable
W1–2	Capture ~30 min of real-flight telemetry on the DIY 5-inch rig	Real-drone dataset
W2–3	Fit residual-dynamics model (Swift method)	`sim2real_residual.pt`
W3–4	Validate PPO policy with residual. Re-tune PID for real IMU latency	DIY drone clears 90%+ gates
W4–5	Adversarial lighting (outdoor, dusk, shadow)	Detector mAP>95% on real-world slice
W5–6	Crew procedure drills: 10-min-window restart, battery swap, crash recovery	<90s crash-to-rearm

Swift used 50 seconds of real-flight data. MonoRace used 5 minutes. We budget ~30 min — the residual-dynamics model that consumes it matters more than the quantity.

§ 08EEffort budget across the year

The filter-then-prize structure means effort allocation looks nothing like the stage cadence suggests:

Stage	Function	Effort %	Rationale
VQ1	Completion filter	10%	Deterministic pilot is enough; don't over-invest
VQ2	Speed filter	25%	Need good PPO, but ship "good enough" not "world-record"
Physical (Sep)	Real-drone filter	30%	Sim-to-real + hardware + crew · where finalists are made
Final (Nov)	Prize race	20%	Adversarial robustness, audience noise, restart procedures
Reserve	Unexpected · rule changes · bugs	15%	Year-long projects always need it

§ 09The key bet

Most teams will treat VQ1 as a dry run for VQ2 and show up with a half-trained PPO policy that took weeks and doesn't reliably complete anything.

We show up to VQ1 with a deterministic, zero-learning completion stack that clears the course on day one. That locks in a passing submission.

Then we spend the month between VQ1 and VQ2 training perception-aware PPO on actual VQ1-sim frames — not synthetic proxies — with the right observation surface (detector output + telemetry, no privileged data). The team that ships a boring solution for VQ1 and uses the window to train a good one for VQ2 wins.

§ 10Competition intel

Confirmed 2026-04-19

KNOWN

Windows-only sim (Linux not working)
Active internet during runs (anti-cheat)
Multiple parallel sim instances supported
Inputs: FPV visual stream + telemetry
No GPS, no absolute positioning, no depth, no sensor shortcuts
Outputs: Throttle, Roll, Pitch, Yaw
VQ1: <10 gates, clear visibility, completion-scored
VQ2: <20 gates, lighting changes + 3D objects + obstacles, fastest valid time
Same course per round; gates change between VQ1 and VQ2
Full 3D environment with elevation changes
8-min max per run; unlimited attempts
No human interaction during runs; code must be reviewable

Still TBD

TBD

Exact Python API / telemetry payload format
Camera resolution, framerate, FOV for virtual stage
Camera specs for physical stage
Exact scoring formula beyond "completion / fastest time"
Distractor-object class inventory
Compute limit on the drone for physical qualifier

§ 11Key references

System	Achievement	Technique	Our use
MonoRace	A2RL 2025 champion, beat 3 human world champs	U-Net GateSeg + PnP + EKF + PPO G&CNet 500Hz	Detector chain + PnP + observation layout
Swift	Nature 2023, first AI to beat humans at FPV	Perception-aware PPO reward	Phase 3 reward + sim-to-real residual
RF-DETR	First real-time model >60 mAP (ICLR 2026)	DINOv2 backbone + deformable decoder	P2 upgrade from YOLO11n
SkyDreamer	ICLR 2025, end-to-end fastest inference (4.3ms)	DreamerV3 pixels→motors, VQ-VAE world model	Reference only (modular beats e2e)

WINNING-STRATEGY · v2.0 2026-04-19 · ← Index

Winning strategy for theAI Grand Prix.

§ 01The insight everyone will miss

What most teams build for VQ1

What we build for VQ1

§ 02VQ1 vs VQ2 — different problems, different stacks

§ 03Pipeline architecture

§ 04Training pipeline

§ 05Detector choice

§ 06Novel training data — what no one else will have

Sim-day dataset capture

Telemetry-conditioned PnP

Adversarial robustness

Parallel-instance RL fan-out

Reviewable submission package

Telemetry-anchored gate-order tracker

§ 07Retired (NED-dependent)

course_mapper.py

mpc_tracker.py

§ 08VQ1 sim day-one checklist

§ 08BReliability math > speed math

§ 08CData pipeline is the moat

§ 08DSim-to-real is the hidden gate

§ 08EEffort budget across the year

§ 09The key bet

§ 10Competition intel

Confirmed 2026-04-19

Still TBD

§ 11Key references

Winning strategy for the
AI Grand Prix.