train_apex.py — three phases based on every autonomous racing winner 2019–2026. YOLO11n detector (Phase 1) → YOLO11n-pose keypoints (Phase 2) → perception-aware PPO (Phase 3). Phase 1+2 also power the VQ1 completion stack without Phase 3.
train_apex.py--privileged-obs flag keeps the legacy path for dev-only runs.
YOLO11n (2.6M params) with recall-maximized augmentation. Primary perception for VQ1 and VQ2. RF-DETR-Base with a DINOv2 backbone is the P2 upgrade path once YOLO is the bottleneck.
python train_apex.py detector \
--model yolo11n --epochs 200
python train_apex.py detector --model rfdetr
| Model | mAP@50 | mAP@50:95 | Latency | Params | License |
|---|---|---|---|---|---|
| YOLO11n | 92.1% | 39.5 | 5ms+ | 2.6M | AGPL-3.0 |
| YOLO26n | 94.3% | 41.5 | 3.5ms | 2.8M | AGPL-3.0 |
| RF-DETR-Nano | 96.5% | 48.4 | 2.3ms | 3.1M | Apache 2.0 |
| RF-DETR-Base | 97.9% | 53.0 | 5.2ms | 29M | Apache 2.0 |
YOLO11n-pose — 4 keypoints (gate corners) for PnP depth estimation. Bounding boxes give you a rectangle; to run PnP and get the gate's 3D pose, you need the actual corner positions. This is the difference between "I see a gate somewhere" and "I know exactly where the gate is in 3D."
| Parameter | Value |
|---|---|
| Model | YOLO11n-pose (keypoint variant) |
| Keypoints | 4 (TL, TR, BR, BL gate corners) |
| Dataset | dataset_gates_mega_pose/ (auto-generated) |
| Epochs | 150 (patience 30) |
| Augmentation | Geometric only (no color jitter) |
python train_apex.py keypoints --epochs 150
import cv2, numpy as np
GATE_CORNERS_3D = np.array([
[-0.75, -0.75, 0], # TL
[ 0.75, -0.75, 0], # TR
[ 0.75, 0.75, 0], # BR
[-0.75, 0.75, 0], # BL
], dtype=np.float32)
corners_2d = keypoints[:4] # (4, 2) pixel coords
success, rvec, tvec = cv2.solvePnP(
GATE_CORNERS_3D, corners_2d, K, dist_coeffs,
flags=cv2.SOLVEPNP_IPPE_SQUARE,
)
# tvec = (x, y, z) gate center in camera frame, meters
PPO with Swift's perception-aware reward — 28D observation (updated for the AIGP sim surface) → 4D action (Throttle, Roll, Pitch, Yaw).
-distance_to_gate + gate_passage_bonus. Add a perception-aware term:
reward += 0.3 * cos(camera_boresight, direction_to_gate)
This single term teaches the drone to always keep the camera pointed at the next gate — critical for a fixed FPV camera. Without it, PPO takes efficient paths that lose sight of the gate during turns.
| Component | Dims | Source |
|---|---|---|
| Detected gate bbox (xywh, normalized) | 4 | Phase 1 per frame |
| Detected gate keypoints (4 corners, xy) | 8 | Phase 2 per frame |
| Detection confidence | 1 | Detector output |
| Frames since last detection (decaying) | 1 | Timestamp counter |
| Attitude quaternion | 4 | Telemetry (from sim) |
| Body angular rates | 3 | Telemetry (IMU gyro) |
| Body linear acceleration | 3 | Telemetry (IMU accel) |
| Last action | 4 | Action buffer (T/R/P/Y) |
| Total | 28 |
All inputs are available from the AIGP sim. No GPS, no NED position, no depth — matches the 2026-04-19 spec exactly. The IGPP EKF + PnP still run as a short-horizon predictor for fallback; their output is not fed into the PPO observation to avoid leaking privileged state.
| Action | Range | Maps to |
|---|---|---|
| Throttle | [0, 1] | Sim Throttle |
| Roll | [-1, 1] | Sim Roll |
| Pitch | [-1, 1] | Sim Pitch |
| Yaw | [-1, 1] | Sim Yaw |
Actor: Linear(28, 256) → ReLU → Linear(256, 256) → ReLU
→ Linear(256, 256) → ReLU → Linear(256, 4) → Tanh
Critic: Linear(28, 256) → ReLU → Linear(256, 256) → ReLU
→ Linear(256, 256) → ReLU → Linear(256, 1)
~200K params · <0.1ms CPU
| Parameter | Value | Note |
|---|---|---|
| Algorithm | PPO (clip 0.2) | Same as Swift + MonoRace |
| Steps | 10M | ~4 hr RTX 5080 |
| Learning rate | 3e-4 | Linear decay to 0 |
| Gamma | 0.99 | Standard discount |
| GAE lambda | 0.95 | GAE |
| n_steps | 2048 | Per update |
| Mini-batches | 32 | Per PPO epoch |
| PPO epochs | 10 | Per rollout |
| Environment | SimDrone → AIGP sim | Dev proxy then fine-tune on VQ1 sim frames via parallel instances |
| Parallel envs | 8+ (SubprocVecEnv) | AIGP update confirms parallel sim instances supported |
python train_apex.py policy --steps 10000000 # default (privileged obs)
python train_apex.py policy --observation-mode=detector_telemetry # VQ2-ready obs
python train_apex.py policy --observation-mode=privileged # legacy dev only
python train_apex.py
# Phase 1: APEX Detector (YOLO11n, 200ep) ~2 hr
# Phase 2: APEX Keypoints (YOLO11n-pose, 150ep) ~1.5 hr
# Phase 3: APEX Policy (PPO, 10M steps) ~4 hr
# ───────────────────────────────────────────────
# Total: ~7.5 hr
# Output:
# models/apex_detector_best.{pt,onnx}
# models/apex_keypoints_best.{pt,onnx}
# output/apex_policy/apex_policy_best.zip
# output/apex_policy/apex_policy.onnx
APEX phases are registered with training_server.py as apex-detector, apex-keypoints, apex-policy. Monitor progress via train_dash.html Training Command Center.