System Architecture · Updated 2026-04-19

Vision-first autonomous
drone-racing pipeline.

FPV visual stream + telemetry → YOLO11n detector → 4-corner keypoints → PnP → controller (PID for VQ1, perception-aware PPO for VQ2) → Throttle / Roll / Pitch / Yaw. No GPS, no absolute positioning, no depth — per the confirmed 2026-04-19 AIGP spec. Every component is a pure function of its inputs; the only mutable state is the detection history and the PPO policy network.

Input

FPV stream + telemetry

from sim

Output

Throttle · Roll · Pitch · Yaw

to sim

Controllers

PID (VQ1) · APEX PPO (VQ2)

shared frontend

Latency

<50 ms end-to-end target

MonoRace 22 ms · Swift 40 ms

2026-04-19 AIGP update + 2026-05-08 VADR-TS-002 reshaped this architecture. MAVLink message names and transport are now concrete: HEARTBEAT, ATTITUDE, HIGHRES_IMU, SET_POSITION_TARGET_LOCAL_NED, SET_ATTITUDE_TARGET, TIMESYNC over UDP (MAVSDK-compatible). Vision stream is UDP:5600 JPEG-chunked, 30 Hz, 640×360. ODOMETRY was removed from the supported set — position is dead-reckoned. See VADR-TS-002 deltas for the full message list + protocol details, or winning-strategy.html for the VQ1 vs VQ2 split.

§ 01Overview

A single FPV camera + telemetry are the only inputs. No GPS, no LiDAR, no depth. All spatial awareness is derived from vision — gate detection in the camera frame → PnP for gate-relative 3D pose → controller. Telemetry (attitude quaternion, body rates, accel) is used for IMU-primary short-horizon state estimation and as a first-class PPO observation component.

Two controller options share the same perception frontend:

VQ1 pipeline

DETERMINISTIC

YOLO11n → YOLO11n-pose → PnP → target-gate tracker → PID (heading, altitude) + feed-forward throttle.

vq1_completion_pilot.pyno learning

VQ2 pipeline

PERCEPTION-AWARE PPO

Same detection frontend + perception-aware PPO [256,256,256] over (detector output + telemetry).

train_apex.py policy28D obs

The pipeline runs as a single async Python process. Every frame triggers the full chain — detect, estimate, decide, command. Target end-to-end latency: under 50 ms per frame (MonoRace achieved 22 ms; Swift 40 ms).

§ 02End-to-end pipeline

FPV + TELEMETRY

sim input

YOLO11n

detector

Keypoints

4 corners

PnP

gate pose

Target tracker

closest in-front

PID / PPO

VQ1 · VQ2

T / R / P / Y

sim output

Timing budget (per frame)

Stage	Latency (target)
Detector (YOLO11n)	~5 ms
Keypoints (YOLO11n-pose)	~3 ms
PnP (SOLVEPNP_IPPE_SQUARE)	~0.5 ms
Target-gate tracker	~0.1 ms
Controller (PID or PPO inference)	~0.2 ms
Transport (sim I/O)	~1 ms
Total (target)	<10 ms

§ 03Component file map

File	Role	Description
`vq1_completion_pilot.py`	CORE	Zero-learning VQ1 pilot: YOLO + 4-corner PnP + PID. Stubbed sim adapter.
`race_pipeline.py`	CORE	Main orchestrator. Async race loop, target-gate tracker, lap counter.
`vision_pipeline.py`	VISION	VisionPipeline class. Dispatches detector backend. Runs PnP on detected corners.
`gate_segmentation.py`	VISION	U-Net + RANSAC alternative detector. Sub-pixel corner extraction.
`train_apex.py`	TRAINING	3-phase APEX: detector → keypoints → PPO. `--observation-mode` flag for VQ2 obs swap.
`rl_controller.py`	CONTROL	Gym env + PPO NeuralController. VQ2 runtime.
`imu_gate_predictor.py`	VISION	IGPP EKF: short-horizon gate-pose prediction when detection drops.
`synthetic_aperture_depth.py`	VISION	SAMD multi-frame depth refinement.
`race_config.py`	CONFIG	RaceConfig dataclass. Gains, thresholds, timeouts, intrinsics. YAML serializable.
`camera_adapter.py`	INPUT	Camera abstraction. Sources: sim pipe, video file, synthetic.
`race_logger.py`	INFRA	JSONL per-frame telemetry logging.
`sim_drone.py`	DEV	6DOF physics proxy for local iteration. Not the real AIGP sim.
`dashboard_server.py`	INFRA	WebSocket telemetry → browser dashboard.
`mpc_tracker.py, course_mapper.py`	RETIRED	NED-dependent. Kept for reference.

§ 04Detector modes

Three interchangeable detector backends, selected in race_config.py. Trade-offs between speed, robustness, and corner accuracy.

Mode	Method	Latency	Trade-off
YOLO11n + pose	YOLO11n detector + YOLO11n-pose 4 corners	~5 ms GPU	Ships for VQ1 + VQ2. Already trained via APEX.
RF-DETR-Nano	DINOv2 backbone · deformable decoder	~2.3 ms TRT	VQ2 upgrade if YOLO recall is bottleneck.
U-Net + RANSAC	Pixel segmentation · RANSAC edge fitting	~5 ms GPU	Sub-pixel corners alternative. Best PnP accuracy.
Color (HSV)	Threshold + contour	<1 ms CPU	Fallback for highlighted gates. Brittle under lighting.

Default: YOLO11n + YOLO11n-pose ships for both VQ1 and VQ2. The 4-corner output is what feeds PnP — bounding boxes alone don't give sub-pixel accuracy.

§ 05Platform configurations

SimDrone (dev proxy)

LOCAL

6DOF Python physics. Local UDP. Host GPU. For APEX development before the real sim lands.

sim_drone.pyany OS

AIGP Sim (official)

COMPETITION

Downloadable package, Windows only. Internet required (anti-cheat). Parallel instances supported. Released shortly before VQ1.

DCL-built simMay 2026

DIY Practice Drone

HARDWARE

Serial MAVLink. RPi Camera 3 Wide. Jetson Orin Nano 8GB. For real-world testing before physical qualifier.

Jetson Orin

Neros Archer

COMP HW

Provided at competition venue. Can't purchase or modify. Camera + MAVLink endpoints specified before physical qualifier. Plan for runtime YAML config.

venue-only

§ 06Communication surface

Telemetry IN

FROM SIM

NOT Position NED — no absolute positioning
NOT Velocity NED — no absolute positioning
Attitude: quaternion or Euler (format TBD)
Body angular rates (rad/s, 3-axis)
Body linear acceleration (IMU)
Plus the FPV visual stream

Commands OUT

TO SIM

Throttle — collective thrust [0,1]
Roll — body roll axis [-1,1]
Pitch — body pitch axis [-1,1]
Yaw — body yaw axis [-1,1]
Exact transport (MAVLink msg, rate, units) ships with sim package

Parameter	Value / Status
Protocol	Likely MAVLink v2 over UDP (TBD confirmation at sim release)
Transport	Local UDP to sim · internet required for anti-cheat handshake
Command rate	TBD (matches sim's input handler; aim for per-frame)
Telemetry rate	TBD (likely streamed continuously alongside frames)
Parallel instances	Supported — wire SubprocVecEnv for PPO training

# Expected control flow (confirmed against sim package)
1. Launch local sim instance (Windows only)
2. Anti-cheat handshake over internet
3. Stream (frame, telemetry) -> our Python AI
4. AI emits (throttle, roll, pitch, yaw) per frame
5. Run until gates cleared (VQ1) or 8-minute timeout

§ 07Key design decisions

Vision-only depth

No GPS, LiDAR, or depth sensor. All distance estimation comes from PnP with 4 known gate corners in world coordinates. Gate dimensions (1.5 m × 1.5 m) are the only world-scale reference. Corner accuracy is the single most important factor.

YOLO11n-pose for corners

YOLO11n-pose outputs 4 gate corners per detection in one pass. Better than bounding-box-only YOLO because PnP needs actual corners, not axis-aligned rectangles. Avoids a second custom keypoint model.

No course-map / no recovery-to-remembered-position

No absolute positioning means we can't fly to a remembered gate position after losing sight. Instead, the pilot enters a slow forward + yaw sweep to re-acquire visually. Faster and more reliable than dead-reckoning without GPS.

Config-driven

Every gain, threshold, and timeout lives in race_config.py as a YAML-serializable dataclass. No magic numbers in pipeline code. Config snapshot saved in every race log for reproducibility.

Aggressive but capped tuning

Parameter	Value	Context
`kp_yaw`	50	Yaw proportional gain
`cruise_pitch`	−25°	Nose-down cruise
`max_tilt`	70°	MonoRace uses 65+
`seek_yaw_rate`	180°/s	Full rotation in 2s

Predictive transit detection

Gate pass-through is not a simple distance threshold (can false-trigger on approach). Computes distance derivative over 3+ consecutive frames. Only when distance is decreasing, below threshold, and the derivative confirms closing velocity does it trigger TRANSIT. Eliminates false transitions from PnP noise at close range.

Architecture summary. FPV frame + telemetry → YOLO11n detector → YOLO11n-pose (4 corners) → PnP for gate-relative pose → controller (PID for VQ1, perception-aware PPO for VQ2) → Throttle/Roll/Pitch/Yaw. No absolute positioning anywhere. Every component is a pure function of its inputs. The only mutable state is the detection history and (for VQ2) the PPO policy network.

ARCHITECTURE · v2.0 2026-04-19 · ← Index · Strategy · APEX

Vision-first autonomousdrone-racing pipeline.

§ 01Overview

VQ1 pipeline

VQ2 pipeline

§ 02End-to-end pipeline

Timing budget (per frame)

§ 03Component file map

§ 04Detector modes

§ 05Platform configurations

SimDrone (dev proxy)

AIGP Sim (official)

DIY Practice Drone

Neros Archer

§ 06Communication surface

Telemetry IN

Commands OUT

§ 07Key design decisions

Vision-only depth

YOLO11n-pose for corners

No course-map / no recovery-to-remembered-position

Config-driven

Aggressive but capped tuning

Predictive transit detection

Vision-first autonomous
drone-racing pipeline.