System Architecture · Updated 2026-04-19

Vision-first autonomous
drone-racing pipeline.

FPV visual stream + telemetry → YOLO11n detector → 4-corner keypoints → PnP → controller (PID for VQ1, perception-aware PPO for VQ2) → Throttle / Roll / Pitch / Yaw. No GPS, no absolute positioning, no depth — per the confirmed 2026-04-19 AIGP spec. Every component is a pure function of its inputs; the only mutable state is the detection history and the PPO policy network.

Input
FPV stream + telemetry
from sim
Output
Throttle · Roll · Pitch · Yaw
to sim
Controllers
PID (VQ1) · APEX PPO (VQ2)
shared frontend
Latency
<50 ms end-to-end target
MonoRace 22 ms · Swift 40 ms
2026-04-19 AIGP update + 2026-05-08 VADR-TS-002 reshaped this architecture. MAVLink message names and transport are now concrete: HEARTBEAT, ATTITUDE, HIGHRES_IMU, SET_POSITION_TARGET_LOCAL_NED, SET_ATTITUDE_TARGET, TIMESYNC over UDP (MAVSDK-compatible). Vision stream is UDP:5600 JPEG-chunked, 30 Hz, 640×360. ODOMETRY was removed from the supported set — position is dead-reckoned. See VADR-TS-002 deltas for the full message list + protocol details, or winning-strategy.html for the VQ1 vs VQ2 split.

§ 01Overview

A single FPV camera + telemetry are the only inputs. No GPS, no LiDAR, no depth. All spatial awareness is derived from vision — gate detection in the camera frame → PnP for gate-relative 3D pose → controller. Telemetry (attitude quaternion, body rates, accel) is used for IMU-primary short-horizon state estimation and as a first-class PPO observation component.

Two controller options share the same perception frontend:

VQ1 pipeline

DETERMINISTIC

YOLO11n → YOLO11n-pose → PnP → target-gate tracker → PID (heading, altitude) + feed-forward throttle.

vq1_completion_pilot.pyno learning

VQ2 pipeline

PERCEPTION-AWARE PPO

Same detection frontend + perception-aware PPO [256,256,256] over (detector output + telemetry).

train_apex.py policy28D obs

The pipeline runs as a single async Python process. Every frame triggers the full chain — detect, estimate, decide, command. Target end-to-end latency: under 50 ms per frame (MonoRace achieved 22 ms; Swift 40 ms).

§ 02End-to-end pipeline

FPV + TELEMETRY
sim input
YOLO11n
detector
Keypoints
4 corners
PnP
gate pose
Target tracker
closest in-front
PID / PPO
VQ1 · VQ2
T / R / P / Y
sim output

Timing budget (per frame)

StageLatency (target)
Detector (YOLO11n)~5 ms
Keypoints (YOLO11n-pose)~3 ms
PnP (SOLVEPNP_IPPE_SQUARE)~0.5 ms
Target-gate tracker~0.1 ms
Controller (PID or PPO inference)~0.2 ms
Transport (sim I/O)~1 ms
Total (target)<10 ms

§ 03Component file map

FileRoleDescription
vq1_completion_pilot.pyCOREZero-learning VQ1 pilot: YOLO + 4-corner PnP + PID. Stubbed sim adapter.
race_pipeline.pyCOREMain orchestrator. Async race loop, target-gate tracker, lap counter.
vision_pipeline.pyVISIONVisionPipeline class. Dispatches detector backend. Runs PnP on detected corners.
gate_segmentation.pyVISIONU-Net + RANSAC alternative detector. Sub-pixel corner extraction.
train_apex.pyTRAINING3-phase APEX: detector → keypoints → PPO. --observation-mode flag for VQ2 obs swap.
rl_controller.pyCONTROLGym env + PPO NeuralController. VQ2 runtime.
imu_gate_predictor.pyVISIONIGPP EKF: short-horizon gate-pose prediction when detection drops.
synthetic_aperture_depth.pyVISIONSAMD multi-frame depth refinement.
race_config.pyCONFIGRaceConfig dataclass. Gains, thresholds, timeouts, intrinsics. YAML serializable.
camera_adapter.pyINPUTCamera abstraction. Sources: sim pipe, video file, synthetic.
race_logger.pyINFRAJSONL per-frame telemetry logging.
sim_drone.pyDEV6DOF physics proxy for local iteration. Not the real AIGP sim.
dashboard_server.pyINFRAWebSocket telemetry → browser dashboard.
mpc_tracker.py, course_mapper.pyRETIREDNED-dependent. Kept for reference.

§ 04Detector modes

Three interchangeable detector backends, selected in race_config.py. Trade-offs between speed, robustness, and corner accuracy.

ModeMethodLatencyTrade-off
YOLO11n + poseYOLO11n detector + YOLO11n-pose 4 corners~5 ms GPUShips for VQ1 + VQ2. Already trained via APEX.
RF-DETR-NanoDINOv2 backbone · deformable decoder~2.3 ms TRTVQ2 upgrade if YOLO recall is bottleneck.
U-Net + RANSACPixel segmentation · RANSAC edge fitting~5 ms GPUSub-pixel corners alternative. Best PnP accuracy.
Color (HSV)Threshold + contour<1 ms CPUFallback for highlighted gates. Brittle under lighting.
Default: YOLO11n + YOLO11n-pose ships for both VQ1 and VQ2. The 4-corner output is what feeds PnP — bounding boxes alone don't give sub-pixel accuracy.

§ 05Platform configurations

SimDrone (dev proxy)

LOCAL

6DOF Python physics. Local UDP. Host GPU. For APEX development before the real sim lands.

sim_drone.pyany OS

AIGP Sim (official)

COMPETITION

Downloadable package, Windows only. Internet required (anti-cheat). Parallel instances supported. Released shortly before VQ1.

DCL-built simMay 2026

DIY Practice Drone

HARDWARE

Serial MAVLink. RPi Camera 3 Wide. Jetson Orin Nano 8GB. For real-world testing before physical qualifier.

Jetson Orin

Neros Archer

COMP HW

Provided at competition venue. Can't purchase or modify. Camera + MAVLink endpoints specified before physical qualifier. Plan for runtime YAML config.

venue-only

§ 06Communication surface

Telemetry IN

FROM SIM
  • NOT Position NED — no absolute positioning
  • NOT Velocity NED — no absolute positioning
  • Attitude: quaternion or Euler (format TBD)
  • Body angular rates (rad/s, 3-axis)
  • Body linear acceleration (IMU)
  • Plus the FPV visual stream

Commands OUT

TO SIM
  • Throttle — collective thrust [0,1]
  • Roll — body roll axis [-1,1]
  • Pitch — body pitch axis [-1,1]
  • Yaw — body yaw axis [-1,1]
  • Exact transport (MAVLink msg, rate, units) ships with sim package
ParameterValue / Status
ProtocolLikely MAVLink v2 over UDP (TBD confirmation at sim release)
TransportLocal UDP to sim · internet required for anti-cheat handshake
Command rateTBD (matches sim's input handler; aim for per-frame)
Telemetry rateTBD (likely streamed continuously alongside frames)
Parallel instancesSupported — wire SubprocVecEnv for PPO training
# Expected control flow (confirmed against sim package)
1. Launch local sim instance (Windows only)
2. Anti-cheat handshake over internet
3. Stream (frame, telemetry) -> our Python AI
4. AI emits (throttle, roll, pitch, yaw) per frame
5. Run until gates cleared (VQ1) or 8-minute timeout

§ 07Key design decisions

Vision-only depth

No GPS, LiDAR, or depth sensor. All distance estimation comes from PnP with 4 known gate corners in world coordinates. Gate dimensions (1.5 m × 1.5 m) are the only world-scale reference. Corner accuracy is the single most important factor.

YOLO11n-pose for corners

YOLO11n-pose outputs 4 gate corners per detection in one pass. Better than bounding-box-only YOLO because PnP needs actual corners, not axis-aligned rectangles. Avoids a second custom keypoint model.

No course-map / no recovery-to-remembered-position

No absolute positioning means we can't fly to a remembered gate position after losing sight. Instead, the pilot enters a slow forward + yaw sweep to re-acquire visually. Faster and more reliable than dead-reckoning without GPS.

Config-driven

Every gain, threshold, and timeout lives in race_config.py as a YAML-serializable dataclass. No magic numbers in pipeline code. Config snapshot saved in every race log for reproducibility.

Aggressive but capped tuning

ParameterValueContext
kp_yaw50Yaw proportional gain
cruise_pitch−25°Nose-down cruise
max_tilt70°MonoRace uses 65+
seek_yaw_rate180°/sFull rotation in 2s

Predictive transit detection

Gate pass-through is not a simple distance threshold (can false-trigger on approach). Computes distance derivative over 3+ consecutive frames. Only when distance is decreasing, below threshold, and the derivative confirms closing velocity does it trigger TRANSIT. Eliminates false transitions from PnP noise at close range.

Architecture summary. FPV frame + telemetry → YOLO11n detector → YOLO11n-pose (4 corners) → PnP for gate-relative pose → controller (PID for VQ1, perception-aware PPO for VQ2) → Throttle/Roll/Pitch/Yaw. No absolute positioning anywhere. Every component is a pure function of its inputs. The only mutable state is the detection history and (for VQ2) the PPO policy network.
ARCHITECTURE · v2.0 2026-04-19 · ← Index · Strategy · APEX