APEX Reference · Updated 2026-04-19

APEX championship
training pipeline.

train_apex.py — three phases based on every autonomous racing winner 2019–2026. YOLO11n detector (Phase 1) → YOLO11n-pose keypoints (Phase 2) → perception-aware PPO (Phase 3). Phase 1+2 also power the VQ1 completion stack without Phase 3.

Script
train_apex.py
one script · three phases
Target
<19 ms pipeline latency on Jetson Orin NX
VQ2 speed
Obs modes
privileged · detector_telemetry
flag-gated
Lineage
MonoRace · Swift · SkyDreamer
hybrid
Observation-swap landmine. Current Phase-3 PPO observation uses sources (ODOMETRY, course mapper) that assumed NED absolute positioning. The 2026-04-19 AIGP update confirmed no GPS, no absolute positioning, no depth. Before any VQ2 submission, the observation must be swapped to detector output (bbox + 4 corners + confidence) + telemetry (attitude, body rates, accel) + last action. See the winning strategy for the new architecture. --privileged-obs flag keeps the legacy path for dev-only runs.
Phase 1
Detector · YOLO11n
Phase 2
Keypoints · 4 corners
PnP
gate pose
Phase 3
PPO policy

§ 01Phase 1 — APEX Detector

YOLO11n (2.6M params) with recall-maximized augmentation. Primary perception for VQ1 and VQ2. RF-DETR-Base with a DINOv2 backbone is the P2 upgrade path once YOLO is the bottleneck.

YOLO11n configuration

SHIP
  • Input 640×640
  • 200 epochs, early-stop patience 50
  • Batch auto (fills VRAM)
  • Recall-maximized aug (mosaic, mixup, hsv_h=0.03)
  • AdamW, lr 1e-3, weight_decay 5e-4
  • Classes: 1 (gate)
python train_apex.py detector \
  --model yolo11n --epochs 200

RF-DETR alternative

P2
  • DINOv2 backbone — superior transfer
  • COCO-format data (dataset_gates_mega_coco/)
  • Only 2 epochs for convergence
  • Apache 2.0 (vs YOLO's AGPL-3.0)
  • 2.3ms latency on T4 TensorRT
python train_apex.py detector --model rfdetr

Model comparison (our dataset)

ModelmAP@50mAP@50:95LatencyParamsLicense
YOLO11n92.1%39.55ms+2.6MAGPL-3.0
YOLO26n94.3%41.53.5ms2.8MAGPL-3.0
RF-DETR-Nano96.5%48.42.3ms3.1MApache 2.0
RF-DETR-Base97.9%53.05.2ms29MApache 2.0

§ 02Phase 2 — APEX Keypoints

YOLO11n-pose — 4 keypoints (gate corners) for PnP depth estimation. Bounding boxes give you a rectangle; to run PnP and get the gate's 3D pose, you need the actual corner positions. This is the difference between "I see a gate somewhere" and "I know exactly where the gate is in 3D."

ParameterValue
ModelYOLO11n-pose (keypoint variant)
Keypoints4 (TL, TR, BR, BL gate corners)
Datasetdataset_gates_mega_pose/ (auto-generated)
Epochs150 (patience 30)
AugmentationGeometric only (no color jitter)
python train_apex.py keypoints --epochs 150

PnP integration

import cv2, numpy as np

GATE_CORNERS_3D = np.array([
    [-0.75, -0.75, 0],  # TL
    [ 0.75, -0.75, 0],  # TR
    [ 0.75,  0.75, 0],  # BR
    [-0.75,  0.75, 0],  # BL
], dtype=np.float32)

corners_2d = keypoints[:4]    # (4, 2) pixel coords
success, rvec, tvec = cv2.solvePnP(
    GATE_CORNERS_3D, corners_2d, K, dist_coeffs,
    flags=cv2.SOLVEPNP_IPPE_SQUARE,
)
# tvec = (x, y, z) gate center in camera frame, meters

§ 03Phase 3 — APEX Policy (perception-aware PPO)

PPO with Swift's perception-aware reward — 28D observation (updated for the AIGP sim surface) → 4D action (Throttle, Roll, Pitch, Yaw).

Swift's key innovation. Standard RL reward: -distance_to_gate + gate_passage_bonus. Add a perception-aware term:
reward += 0.3 * cos(camera_boresight, direction_to_gate)
This single term teaches the drone to always keep the camera pointed at the next gate — critical for a fixed FPV camera. Without it, PPO takes efficient paths that lose sight of the gate during turns.

Observation space (detector + telemetry, 28D)

ComponentDimsSource
Detected gate bbox (xywh, normalized)4Phase 1 per frame
Detected gate keypoints (4 corners, xy)8Phase 2 per frame
Detection confidence1Detector output
Frames since last detection (decaying)1Timestamp counter
Attitude quaternion4Telemetry (from sim)
Body angular rates3Telemetry (IMU gyro)
Body linear acceleration3Telemetry (IMU accel)
Last action4Action buffer (T/R/P/Y)
Total28

All inputs are available from the AIGP sim. No GPS, no NED position, no depth — matches the 2026-04-19 spec exactly. The IGPP EKF + PnP still run as a short-horizon predictor for fallback; their output is not fed into the PPO observation to avoid leaking privileged state.

Action space (4D)

ActionRangeMaps to
Throttle[0, 1]Sim Throttle
Roll[-1, 1]Sim Roll
Pitch[-1, 1]Sim Pitch
Yaw[-1, 1]Sim Yaw

Network + training

Actor:  Linear(28, 256) → ReLU → Linear(256, 256) → ReLU
        → Linear(256, 256) → ReLU → Linear(256, 4) → Tanh
Critic: Linear(28, 256) → ReLU → Linear(256, 256) → ReLU
        → Linear(256, 256) → ReLU → Linear(256, 1)

~200K params · <0.1ms CPU
ParameterValueNote
AlgorithmPPO (clip 0.2)Same as Swift + MonoRace
Steps10M~4 hr RTX 5080
Learning rate3e-4Linear decay to 0
Gamma0.99Standard discount
GAE lambda0.95GAE
n_steps2048Per update
Mini-batches32Per PPO epoch
PPO epochs10Per rollout
EnvironmentSimDrone → AIGP simDev proxy then fine-tune on VQ1 sim frames via parallel instances
Parallel envs8+ (SubprocVecEnv)AIGP update confirms parallel sim instances supported
python train_apex.py policy --steps 10000000                               # default (privileged obs)
python train_apex.py policy --observation-mode=detector_telemetry          # VQ2-ready obs
python train_apex.py policy --observation-mode=privileged                  # legacy dev only

§ 04Overnight training — one command

python train_apex.py

# Phase 1: APEX Detector (YOLO11n, 200ep)          ~2 hr
# Phase 2: APEX Keypoints (YOLO11n-pose, 150ep)    ~1.5 hr
# Phase 3: APEX Policy (PPO, 10M steps)            ~4 hr
# ───────────────────────────────────────────────
# Total:                                            ~7.5 hr

# Output:
#   models/apex_detector_best.{pt,onnx}
#   models/apex_keypoints_best.{pt,onnx}
#   output/apex_policy/apex_policy_best.zip
#   output/apex_policy/apex_policy.onnx

APEX phases are registered with training_server.py as apex-detector, apex-keypoints, apex-policy. Monitor progress via train_dash.html Training Command Center.

§ 05Research foundations

MonoRace

A2RL 2025
  • U-Net GateNet + PnP + EKF
  • PPO G&CNet · 24D obs → 4 motor cmds at 500Hz
  • Peak speed 28.23 m/s
  • Beat 3 human world champions

Swift

NATURE 2023
  • PPO with perception-aware reward
  • Asymmetric actor-critic (privileged critic)
  • Sim-to-real from 50s real flight data
  • First AI to beat human champions

SkyDreamer

ICLR 2025
  • DreamerV3 end-to-end pixels→motors
  • VQ-VAE latent world model
  • No explicit perception pipeline
  • Backup approach if modular fails
APEX-PIPELINE · v2.0 2026-04-19 · ← Index · IGPP + SAMD · Strategy