Anduril × DCL · $500,000 prize pool · two virtual qualifiers, one physical qualifier, one final. The honest plan: a deterministic VQ1 completion stack first to lock in a submission, then APEX perception-aware PPO for VQ2 speed. Rebuilt after the 2026-04-19 spec confirmed no GPS, no absolute positioning, FPV + telemetry in, Throttle / Roll / Pitch / Yaw out.
End-to-end PPO trained against privileged NED gate positions. Fails at submission because the real sim gives no absolute positioning.
YOLO11n detector → 4-corner keypoints → PnP → PID with telemetry. Zero learning on the critical path. Ships before VQ1 opens.
cos(camera_boresight, next_gate); teaches seek-attack behavior.Same input surface and same output surface for both tracks. The VQ1 and VQ2 stacks diverge only at the controller stage.
python train_apex.py — one script, three phases.
| Phase | Model | Device | Dataset | Time | Used by |
|---|---|---|---|---|---|
| 1: Detector | YOLO11n · 2.6M params | GPU | dataset_gates_mega | ~2 hr | VQ1 + VQ2 |
| 2: Keypoints | YOLO11n-pose · 4 corners | GPU | dataset_gates_mega_pose | ~1.5 hr | VQ1 + VQ2 |
| 3: APEX Policy UPDATED | Perception-aware PPO · MLP [256,256,256] | CPU + parallel envs | SimDrone → VQ1 sim fine-tune | ~4 hr | VQ2 only |
course_map_test.json. That will not transfer to the real AIGP sim — no absolute positioning. Before any VQ2 submission, the observation must be swapped to detector output (bbox + keypoints + conf) + telemetry (attitude + body rates + accel) + last action. The --privileged-obs flag keeps the old path for dev runs.
python train_apex.py # all three phases
python train_apex.py detector --epochs 5 # smoke
python train_apex.py policy --observation-mode=detector_telemetry
| Metric | YOLO11n | RF-DETR-Nano | RF-DETR-Small |
|---|---|---|---|
| mAP50:95 | 39.5 | 48.4 | 53.0 |
| Latency (T4 TRT) | 5ms+ | 2.32ms | 3.52ms |
| Aerial dataset gap | baseline | +5 mAP | +8 mAP |
| Backbone | CNN | DINOv2 | DINOv2 |
| Jetson Orin speed | baseline | +20–30% | comparable |
| License | AGPL-3.0 | Apache 2.0 | Apache 2.0 |
We ship YOLO11n first (already trained) for VQ1. RF-DETR-Nano retrain is a P2 upgrade if VQ1-sim frames reveal YOLO recall is the bottleneck. Both feed the same keypoints + PnP + controller stack downstream.
Once VQ1 sim credentials land: slow manual laps capturing (frame, telemetry, target-gate) triples. ~10 min on the actual VQ1 sim gives ~6000 in-distribution training frames. Fine-tune YOLO11n + YOLO11n-pose on these.
Project PnP results into a stable body frame using telemetry attitude; average over the last K frames. Reduces PnP jitter during hard banking. Zero labeling cost.
VQ2 adds lighting changes, 3D objects, obstacles. Hard-negative mining: brightness shifts, lens flare, motion blur, partial occlusion, gate-like false positives. dataset_gates_hardneg is the starter.
AIGP sim explicitly supports multiple parallel instances on one box. Wire SubprocVecEnv for 8+ envs on the RTX 5080 host. Training wall-clock drops 6–8×.
Rules require code access for review. Pin dep versions, one-command reproduction, traceable outputs. Reviewer can re-run any artifact from deterministic inputs.
No absolute positioning means no course map. Telemetry-integrated heading + last-seen gate bearing gives a short-horizon "gate after next" predictor for occluded frames. Cheap, no one else will build it.
The 2026-04-19 update confirmed there is no absolute positioning. These were built on the NED assumption and are retired — kept in-repo for reference, not on the critical path.
Kalman filter fusing ODOMETRY + visual detections into a global NED gate map. No GPS / no absolute positioning — the input signal doesn't exist. Replaced by frame-local PnP + target-gate tracker.
Track a precomputed NED trajectory at 120Hz using ODOMETRY state. Can't precompute an NED trajectory without NED. Replaced by PID (VQ1) + PPO (VQ2), both frame-local.
| Priority | Task | Why | Time |
|---|---|---|---|
| P0 | Wire telemetry adapter | Map sim payload → our observation format | ~1 hr |
| P0 | Run VQ1 completion stack | YOLO + keypoints + PnP + PID. Target 100% gate-clear. | ~2 hr tune |
| P0 | Capture sim-frame dataset | 10 min telemetry-labeled frames for detector fine-tune | ~15 min + 2 hr retrain |
| P0 | Submit VQ1 completion entry | Lock in a passing submission before iterating | ~30 min |
| P1 | SubprocVecEnv wrapper | Stand up 8 instances for PPO training | ~3 hr |
| P1 | Swap PPO observation | Required for VQ2 submission to transfer | ~4 hr |
| P2 | RF-DETR retrain on sim frames | Only if YOLO recall is the VQ2 bottleneck | ~2 hr |
Each VQ2 crash costs ~15 minutes of wall clock: the failed 8-minute run + restart + warm-up + anti-cheat rehandshake. In expected-time terms:
E[lap_time] = p(clear) · best_time + p(crash) · 15 min
45s pipeline, 5% crash: 0.95·45 + 0.05·900 = 87.75s equivalent
50s pipeline, 0.5% crash: 0.995·50 + 0.005·900 = 54.25s equivalent
The slower, more reliable pipeline wins by 33 seconds per attempt.
Unlimited attempts × 8-min runs × 10 fps = 4,800 frames per attempt. By VQ2 deadline, a team that auto-labels every attempt will have 500K+ in-distribution training frames. A team that doesn't is still training on the initial 2,000 synthetic frames.
Between VQ2 close (~late July) and Physical (early September) is a ~6-week window. Most teams will relax. That's when Physical is actually won.
| Week | Task | Deliverable |
|---|---|---|
| W1–2 | Capture ~30 min of real-flight telemetry on the DIY 5-inch rig | Real-drone dataset |
| W2–3 | Fit residual-dynamics model (Swift method) | sim2real_residual.pt |
| W3–4 | Validate PPO policy with residual. Re-tune PID for real IMU latency | DIY drone clears 90%+ gates |
| W4–5 | Adversarial lighting (outdoor, dusk, shadow) | Detector mAP>95% on real-world slice |
| W5–6 | Crew procedure drills: 10-min-window restart, battery swap, crash recovery | <90s crash-to-rearm |
The filter-then-prize structure means effort allocation looks nothing like the stage cadence suggests:
| Stage | Function | Effort % | Rationale |
|---|---|---|---|
| VQ1 | Completion filter | 10% | Deterministic pilot is enough; don't over-invest |
| VQ2 | Speed filter | 25% | Need good PPO, but ship "good enough" not "world-record" |
| Physical (Sep) | Real-drone filter | 30% | Sim-to-real + hardware + crew · where finalists are made |
| Final (Nov) | Prize race | 20% | Adversarial robustness, audience noise, restart procedures |
| Reserve | Unexpected · rule changes · bugs | 15% | Year-long projects always need it |
Most teams will treat VQ1 as a dry run for VQ2 and show up with a half-trained PPO policy that took weeks and doesn't reliably complete anything.
We show up to VQ1 with a deterministic, zero-learning completion stack that clears the course on day one. That locks in a passing submission.
Then we spend the month between VQ1 and VQ2 training perception-aware PPO on actual VQ1-sim frames — not synthetic proxies — with the right observation surface (detector output + telemetry, no privileged data). The team that ships a boring solution for VQ1 and uses the window to train a good one for VQ2 wins.
| System | Achievement | Technique | Our use |
|---|---|---|---|
| MonoRace | A2RL 2025 champion, beat 3 human world champs | U-Net GateSeg + PnP + EKF + PPO G&CNet 500Hz | Detector chain + PnP + observation layout |
| Swift | Nature 2023, first AI to beat humans at FPV | Perception-aware PPO reward | Phase 3 reward + sim-to-real residual |
| RF-DETR | First real-time model >60 mAP (ICLR 2026) | DINOv2 backbone + deformable decoder | P2 upgrade from YOLO11n |
| SkyDreamer | ICLR 2025, end-to-end fastest inference (4.3ms) | DreamerV3 pixels→motors, VQ-VAE world model | Reference only (modular beats e2e) |