APEX Reference · Updated 2026-04-19

APEX championship
training pipeline.

train_apex.py — three phases based on every autonomous racing winner 2019–2026. YOLO11n detector (Phase 1) → YOLO11n-pose keypoints (Phase 2) → perception-aware PPO (Phase 3). Phase 1+2 also power the VQ1 completion stack without Phase 3.

Script

train_apex.py

one script · three phases

Target

<19 ms pipeline latency on Jetson Orin NX

VQ2 speed

Obs modes

privileged · detector_telemetry

flag-gated

Lineage

MonoRace · Swift · SkyDreamer

hybrid

Observation-swap landmine. Current Phase-3 PPO observation uses sources (ODOMETRY, course mapper) that assumed NED absolute positioning. The 2026-04-19 AIGP update confirmed no GPS, no absolute positioning, no depth. Before any VQ2 submission, the observation must be swapped to detector output (bbox + 4 corners + confidence) + telemetry (attitude, body rates, accel) + last action. See the winning strategy for the new architecture. --privileged-obs flag keeps the legacy path for dev-only runs.

Phase 1

Detector · YOLO11n

Phase 2

Keypoints · 4 corners

PnP

gate pose

Phase 3

PPO policy

§ 01Phase 1 — APEX Detector

YOLO11n (2.6M params) with recall-maximized augmentation. Primary perception for VQ1 and VQ2. RF-DETR-Base with a DINOv2 backbone is the P2 upgrade path once YOLO is the bottleneck.

YOLO11n configuration

SHIP

Input 640×640
200 epochs, early-stop patience 50
Batch auto (fills VRAM)
Recall-maximized aug (mosaic, mixup, hsv_h=0.03)
AdamW, lr 1e-3, weight_decay 5e-4
Classes: 1 (gate)

python train_apex.py detector \
  --model yolo11n --epochs 200

RF-DETR alternative

DINOv2 backbone — superior transfer
COCO-format data (dataset_gates_mega_coco/)
Only 2 epochs for convergence
Apache 2.0 (vs YOLO's AGPL-3.0)
2.3ms latency on T4 TensorRT

python train_apex.py detector --model rfdetr

Model comparison (our dataset)

Model	mAP@50	mAP@50:95	Latency	Params	License
YOLO11n	92.1%	39.5	5ms+	2.6M	AGPL-3.0
YOLO26n	94.3%	41.5	3.5ms	2.8M	AGPL-3.0
RF-DETR-Nano	96.5%	48.4	2.3ms	3.1M	Apache 2.0
RF-DETR-Base	97.9%	53.0	5.2ms	29M	Apache 2.0

§ 02Phase 2 — APEX Keypoints

YOLO11n-pose — 4 keypoints (gate corners) for PnP depth estimation. Bounding boxes give you a rectangle; to run PnP and get the gate's 3D pose, you need the actual corner positions. This is the difference between "I see a gate somewhere" and "I know exactly where the gate is in 3D."

Parameter	Value
Model	YOLO11n-pose (keypoint variant)
Keypoints	4 (TL, TR, BR, BL gate corners)
Dataset	`dataset_gates_mega_pose/` (auto-generated)
Epochs	150 (patience 30)
Augmentation	Geometric only (no color jitter)

python train_apex.py keypoints --epochs 150

PnP integration

import cv2, numpy as np

GATE_CORNERS_3D = np.array([
    [-0.75, -0.75, 0],  # TL
    [ 0.75, -0.75, 0],  # TR
    [ 0.75,  0.75, 0],  # BR
    [-0.75,  0.75, 0],  # BL
], dtype=np.float32)

corners_2d = keypoints[:4]    # (4, 2) pixel coords
success, rvec, tvec = cv2.solvePnP(
    GATE_CORNERS_3D, corners_2d, K, dist_coeffs,
    flags=cv2.SOLVEPNP_IPPE_SQUARE,
)
# tvec = (x, y, z) gate center in camera frame, meters

§ 03Phase 3 — APEX Policy (perception-aware PPO)

PPO with Swift's perception-aware reward — 28D observation (updated for the AIGP sim surface) → 4D action (Throttle, Roll, Pitch, Yaw).

Swift's key innovation. Standard RL reward: -distance_to_gate + gate_passage_bonus. Add a perception-aware term:

reward += 0.3 * cos(camera_boresight, direction_to_gate)

This single term teaches the drone to always keep the camera pointed at the next gate — critical for a fixed FPV camera. Without it, PPO takes efficient paths that lose sight of the gate during turns.

Observation space (detector + telemetry, 28D)

Component	Dims	Source
Detected gate bbox (xywh, normalized)	4	Phase 1 per frame
Detected gate keypoints (4 corners, xy)	8	Phase 2 per frame
Detection confidence	1	Detector output
Frames since last detection (decaying)	1	Timestamp counter
Attitude quaternion	4	Telemetry (from sim)
Body angular rates	3	Telemetry (IMU gyro)
Body linear acceleration	3	Telemetry (IMU accel)
Last action	4	Action buffer (T/R/P/Y)
Total	28

All inputs are available from the AIGP sim. No GPS, no NED position, no depth — matches the 2026-04-19 spec exactly. The IGPP EKF + PnP still run as a short-horizon predictor for fallback; their output is not fed into the PPO observation to avoid leaking privileged state.

Action space (4D)

Action	Range	Maps to
Throttle	[0, 1]	Sim Throttle
Roll	[-1, 1]	Sim Roll
Pitch	[-1, 1]	Sim Pitch
Yaw	[-1, 1]	Sim Yaw

Network + training

Actor:  Linear(28, 256) → ReLU → Linear(256, 256) → ReLU
        → Linear(256, 256) → ReLU → Linear(256, 4) → Tanh
Critic: Linear(28, 256) → ReLU → Linear(256, 256) → ReLU
        → Linear(256, 256) → ReLU → Linear(256, 1)

~200K params · <0.1ms CPU

Parameter	Value	Note
Algorithm	PPO (clip 0.2)	Same as Swift + MonoRace
Steps	10M	~4 hr RTX 5080
Learning rate	3e-4	Linear decay to 0
Gamma	0.99	Standard discount
GAE lambda	0.95	GAE
n_steps	2048	Per update
Mini-batches	32	Per PPO epoch
PPO epochs	10	Per rollout
Environment	SimDrone → AIGP sim	Dev proxy then fine-tune on VQ1 sim frames via parallel instances
Parallel envs	8+ (SubprocVecEnv)	AIGP update confirms parallel sim instances supported

python train_apex.py policy --steps 10000000                               # default (privileged obs)
python train_apex.py policy --observation-mode=detector_telemetry          # VQ2-ready obs
python train_apex.py policy --observation-mode=privileged                  # legacy dev only

§ 04Overnight training — one command

python train_apex.py

# Phase 1: APEX Detector (YOLO11n, 200ep)          ~2 hr
# Phase 2: APEX Keypoints (YOLO11n-pose, 150ep)    ~1.5 hr
# Phase 3: APEX Policy (PPO, 10M steps)            ~4 hr
# ───────────────────────────────────────────────
# Total:                                            ~7.5 hr

# Output:
#   models/apex_detector_best.{pt,onnx}
#   models/apex_keypoints_best.{pt,onnx}
#   output/apex_policy/apex_policy_best.zip
#   output/apex_policy/apex_policy.onnx

APEX phases are registered with training_server.py as apex-detector, apex-keypoints, apex-policy. Monitor progress via train_dash.html Training Command Center.

§ 05Research foundations

MonoRace

A2RL 2025

U-Net GateNet + PnP + EKF
PPO G&CNet · 24D obs → 4 motor cmds at 500Hz
Peak speed 28.23 m/s
Beat 3 human world champions

Swift

NATURE 2023

PPO with perception-aware reward
Asymmetric actor-critic (privileged critic)
Sim-to-real from 50s real flight data
First AI to beat human champions

SkyDreamer

ICLR 2025

DreamerV3 end-to-end pixels→motors
VQ-VAE latent world model
No explicit perception pipeline
Backup approach if modular fails

APEX-PIPELINE · v2.0 2026-04-19 · ← Index · IGPP + SAMD · Strategy