VQ1 completion stack (deterministic, ships first) + VQ2 APEX PPO with a detector + telemetry observation. Rebuilt after the 2026-04-19 AIGP spec confirmed: no GPS, no absolute positioning, no depth. Inputs are FPV + telemetry; outputs are Throttle / Roll / Pitch / Yaw.
References: MonoRace (A2RL 2025 champion, TU Delft MAVLab) · Swift (Nature 2023, UZH) · SkyDreamer (ICLR 2025).
Deterministic detector + PnP + PID. No learning on critical path. VQ1 scores on completion, not time — a reliable zero-learning stack is the safest submission.
class VQ1CompletionPilot:
def step(self, frame_bgr, telemetry):
# 1) detect gates
detections = self.det.infer(frame_bgr)
if not detections:
return self._search_fallback(telemetry)
# 2) pick target: closest in-front gate with highest confidence
target = self._select_target(detections, telemetry)
# 3) keypoints + PnP
corners = self.kp.infer(frame_bgr, target.bbox)
rvec, tvec = solve_pnp_square(GATE_3D_1p5m, corners, self.K)
# 4) body-frame gate pose
gate_body = rotate_by_quat(tvec, telemetry.attitude_q)
# 5) PID controller
heading_err = atan2(gate_body.y, gate_body.x)
throttle = self._thrust_curve(|gate_body|, gate_body.z)
return throttle, roll, pitch, yaw
Current Phase-3 PPO reads gate positions from course_map_test.json. That's not available in the real AIGP sim. Rebuild around signals the sim actually exposes:
# OLD (privileged — doesn't transfer):
obs = np.concat([
drone_pos_ned, # 3D · NOT AVAILABLE
drone_vel_ned, # 3D · NOT AVAILABLE
gate_positions_ned[:2], # 6D · NOT AVAILABLE
...
])
# NEW (detector + telemetry):
obs = np.concat([
# Vision side (per frame)
detected_gate_bbox_xywh, # 4D · normalized [0,1]
detected_gate_keypoints_xy, # 8D · 4 corners
detection_confidence, # 1D
frames_since_detection, # 1D · decays
# Telemetry side (from sim)
attitude_quaternion, # 4D
body_rates, # 3D · roll/pitch/yaw rates
body_accel, # 3D
# History
last_action, # 4D · T/R/P/Y
]) # Total: 28D
python train_apex.py policy # default
python train_apex.py policy --observation-mode=privileged # legacy (dev only)
Swift's single reward term teaches seek-attack behavior without a state machine. Adapted for detector-output obs: reward "gate near frame center at high confidence."
def perception_reward(detection):
if detection is None:
return -0.2 # penalty for losing sight
cx, cy = detection.bbox_center_normalized
centered = 1.0 - 2.0 * np.hypot(cx - 0.5, cy - 0.5)
return centered * detection.confidence
def step_reward(obs, detection, action, prev_action, crashed, gate_passed):
r_progress = 10.0 if gate_passed else 0.0
r_perception = 0.5 * perception_reward(detection)
r_smooth = -0.1 * np.sum(np.abs(action - prev_action))
r_crash = -5.0 if crashed else 0.0
return r_progress + r_perception + r_smooth + r_crash
Current: 2,759 training images (dataset_gates_mega). Target: 10,000+ with domain randomization + hard negatives, because VQ2 adds lighting changes, 3D objects, and non-gated obstacles that can trigger false positives.
| Source | Count | Method | Status |
|---|---|---|---|
| Existing real | 2,759 | Auto-labeled FPV frames | DONE |
| Hard negatives | ~1,000 | generate_training_sets.py | DONE |
| Domain randomized | ~1,000 | generate_training_sets.py | DONE |
| Sim-day capture | ~6,000 | Slow laps on actual VQ1 sim | GATED ON SIM |
| Synthetic renders | ~2,000 | Blender / Three.js with VQ2 distractors | TODO |
Albumentations pipeline stays unchanged (brightness, motion blur, occlusion, shadow).
AIGP update explicitly allows multiple sim instances on one box. Wrap the sim client in a SubprocVecEnv and run 8+ envs on the RTX 5080 host. Wall-clock should drop 6–8×.
from stable_baselines3.common.vec_env import SubprocVecEnv
from stable_baselines3 import PPO
def make_env(instance_id):
def _init():
return AIGPSimEnv(instance_id=instance_id, port=6000 + instance_id)
return _init
env = SubprocVecEnv([make_env(i) for i in range(8)])
model = PPO("MlpPolicy", env, n_steps=2048, batch_size=512, ...)
model.learn(total_timesteps=10_000_000)
Caveat: active internet required per instance (anti-cheat). Budget bandwidth + make sure the sim's handshake supports parallel clients (confirmed in update, details TBD).
Built on the assumption that the sim exposes ODOMETRY in NED. VADR-TS-002 confirmed it doesn't — ODOMETRY was removed from the supported MAVLink message set in the 2026-05-08 spec revision. As of 2026-05-12 the ODOMETRY subscription is deleted from mavsdk_bridge.py; an InertialIntegrator dead-reckons position from HIGHRES_IMU + ATTITUDE with gate-PnP corrections. See winning-strategy.html retired-components section.
YOLO11n-pose (APEX Phase 2) already produces 4 gate corners at better accuracy and integrates in one pass. A second custom model is redundant.
Feed-forward monocular SLAM does not work at racing velocities — see our 2026-04-17 investigation. Removed from the repo.
| Priority | Task | Why | Time |
|---|---|---|---|
| P0 | Telemetry adapter | Map sim payload → our obs format | ~1 hr |
| P0 | VQ1 completion pilot shakedown | 100% gate-clear at conservative speed | ~2 hr tune |
| P0 | Sim-frame dataset capture | 10 min slow laps for detector fine-tune | ~15 min + retrain |
| P0 | Submit VQ1 entry | Lock in a passing submission | ~30 min |
| P1 | SubprocVecEnv wrapper | 8-instance parallel for PPO training | ~3 hr |
| P1 | Swap PPO observation | Detector + telemetry instead of privileged | ~4 hr |
| P1 | Run Phase 3 PPO on captured sim frames | VQ2 prep | overnight |
| P2 | RF-DETR retrain | Only if YOLO recall limits VQ2 | ~2 hr |
| Metric | VQ1 target | VQ2 target | Current |
|---|---|---|---|
| Gate detection mAP@50 | >95% | >99% | 97.9% (RF-DETR) · YOLO11n TBM |
| Detection latency | <10ms | <5ms | ~8ms (RF-DETR PT) |
| PnP depth error @ 5m | <15% | <5% | ~15% (single-frame) |
| Course completion rate | 100% | >95% | TBD on sim |
| Lap time vs MonoRace | — | within 20% | TBD |
| End-to-end latency | <50ms | <30ms | ~80ms estimated |
| PPO training steps | — | 10M+ (parallel envs) | 2M single-env |
| Component | File | Status |
|---|---|---|
| APEX detector training | train_apex.py detector | COMPLETE |
| APEX keypoint training | train_apex.py keypoints | COMPLETE |
| APEX PPO training | train_apex.py policy | Needs observation swap |
| RL environment | rl_controller.py | Needs observation swap |
| YOLO training | yolo-train.py | COMPLETE |
| RF-DETR training | train_models.py | COMPLETE · 97.9% mAP |
| Gate segmentation | gate_segmentation.py | COMPLETE |
| IGPP EKF | imu_gate_predictor.py | COMPLETE · short-horizon fallback |
| SAMD multi-frame depth | synthetic_aperture_depth.py | COMPLETE |
| 6DOF SimDrone | sim_drone.py | COMPLETE (dev proxy) |
| Model evaluation | evaluate_models.py | COMPLETE |
| VQ1 completion pilot | vq1_completion_pilot.py | Stub · needs sim interface |
| MPC tracker + course mapper | mpc_tracker.py, course_mapper.py | RETIRED · NED-dependent |