AI Grand Prix — System Architecture

End-to-end autonomous drone racing pipeline • Vision-only navigation • Ctrl+P to save as PDF

1. Overview

Core constraint: A single FPV camera is the only sensor. No GPS. No depth sensor. No LiDAR. All spatial awareness is derived from vision — gate detection, distance estimation, and navigation are computed purely from pixel data at 120 Hz.

This system pilots a racing drone autonomously through a sequence of gates at high speed. The architecture is based on MonoRace (2025 A2RL Autonomous Racing League champion): a U-Net segmentation network identifies gate pixels, RANSAC fits edge lines to extract sub-pixel corners, Perspective-n-Point (PnP) solves for 3D gate pose relative to the drone, and a state machine sequences approach, transit, and re-acquisition phases. A controller converts desired trajectories into attitude commands sent over MAVLink to the PX4 autopilot.

The pipeline runs as a single async Python process. Every frame triggers the full chain — detect, estimate, decide, command — with no queuing or frame drops. Latency from camera capture to motor command is under 8 ms on target hardware.

2. End-to-End Pipeline

Camera Frame 640x480 BGR @ 120 Hz Vision Pipeline U-Net / YOLO / Color Segmentation + Detection vision_pipeline.py Gate Detection corners_2d (4 pts) confidence score gate_segmentation.py PnP Estimator distance, position rotation (gate pose) cv2.solvePnP Gate Tracker EMA smoothing + derivative State Machine SEEK APPROACH TRANSIT Controller Attitude / Velocity cmds trajectory_optimizer.py drone_mpc_foundation.py MAVSDK Bridge MAVLink v2 over UDP PX4 Autopilot Flight controller firmware Motors 4x ESC + BLDC Telemetry feedback (120 Hz) Web Dashboard 3D + FPV + telemetry WebSocket Timing budget (per frame) Vision + PnP: ~5 ms State + Controller: ~0.5 ms MAVLink TX: ~0.2 ms

The pipeline is fully synchronous within each frame. No thread pools, no message queues. The async event loop in race_pipeline.py awaits each stage sequentially, ensuring deterministic ordering. The only concurrent path is the telemetry listener, which runs in a background coroutine and updates shared state atomically.

3. Component File Map

File Role Description
race_pipeline.py core Main orchestrator. Async race loop, state machine (SEEK / APPROACH / TRANSIT), gate sequencing, lap counter. Entry point for the race.
vision_pipeline.py vision VisionPipeline class. Dispatches to one of 3 detector backends (Color, YOLO, U-Net). Runs PnP estimator on detected corners. Returns gate pose + confidence.
gate_segmentation.py vision GateSegNet U-Net model definition. RANSAC edge-line fitting for sub-pixel corner extraction. Training loop with augmentation.
race_config.py config RaceConfig dataclass. Every tunable parameter in one place: gains, thresholds, timeouts, camera intrinsics. YAML serialization for reproducibility.
trajectory_optimizer.py control Quintic polynomial path planning. Computes racing lines through gates with minimum-snap trajectories. Time-optimal velocity profiling.
drone_mpc_foundation.py control DroneParams physical model, AttitudeMPC controller, GatePursuitController for proportional-derivative gate tracking.
mavsdk_bridge.py comms SimBridge class. MAVLink communication layer. Offboard control modes (attitude, velocity, position). Heartbeat management.
camera_adapter.py input Camera abstraction. Sources: Gazebo pipe (sim), video file (replay), synthetic (testing). Uniform frame interface.
race_logger.py infra JSONL race telemetry logging. Per-frame state snapshots for post-race analysis and debugging.
rl_controller.py control Gym environment wrapper, PPO training loop, NeuralController inference. Reinforcement learning alternative to PD controller.
dashboard_server.py infra Tornado WebSocket server. Streams telemetry to browser dashboard at 30 Hz.
dashboard.html infra Web dashboard with Three.js 3D scene, FPV camera feed, real-time telemetry gauges, lap timing.
yolo-train.py vision YOLOv8 training pipeline. Dataset management, hyperparameter config, TensorRT FP16 export for edge deployment.
yolo-auto-label.py vision Bootstrapping tool. Uses VQ1 color detection on known-color gates to generate YOLO bounding box labels automatically.

4. Three Detection Modes

The vision pipeline supports three interchangeable detector backends, selected via race_config.py. Each trades off between speed, robustness, and accuracy.

Mode ID Method Latency Trade-off
Color VQ1 HSV threshold + contour ~0.5 ms Fastest. Requires highlighted/colored gates. Brittle under varying lighting. Good for sim bootstrapping.
YOLO VQ2 YOLOv8 neural network ~12 ms Handles complex backgrounds and partial occlusion. Needs labeled training data. Bounding box only (no corners).
U-Net Primary Pixel segmentation + RANSAC ~5 ms Best PnP accuracy. Pixel-level gate mask enables RANSAC edge fitting and line intersection for sub-pixel corners.
Recommended: Use U-Net as the primary detector. It provides the most accurate corner localization, which directly improves PnP depth estimation — the most critical measurement in the entire pipeline.

5. Platform Configurations

The same codebase runs on three distinct platforms. All platform-specific differences are isolated in race_config.py — no conditional logic in the pipeline code.

Simulator (PX4 SITL + Gazebo)

Connectionudpin://0.0.0.0:14540
CameraSynthetic / Gazebo pipe
ComputeHost machine (any GPU)
Use caseDevelopment + VQ testing

DIY Practice Drone

ConnectionSerial MAVLink (UART)
CameraRPi Camera 3 Wide
ComputeJetson Orin Nano 8GB
Use caseReal-world testing

Neros Archer (Competition Hardware)

ConnectionTBD at event
CameraProvided by Neros
ComputeOnboard (Neros spec)
Use caseOfficial competition
Note: The Neros Archer hardware is provided at the competition venue. You cannot purchase or modify it. Camera intrinsics and MAVLink endpoints will be provided before the physical qualifier. Plan for runtime configuration via YAML.

6. Communication Architecture

Telemetry IN (120 Hz)

Commands OUT (50-120 Hz)

Parameter Value
ProtocolMAVLink v2
TransportUDP (sim) / Serial UART (hardware)
Heartbeat2 Hz minimum for offboard mode (we send 4 Hz)
Command rate50-120 Hz (matches vision frame rate)
Telemetry rate120 Hz (position, velocity, attitude, IMU)
# Offboard control flow
1. Connect to PX4 via UDP/serial
2. Start heartbeat at 4 Hz
3. Stream SET_ATTITUDE_TARGET at frame rate
4. PX4 enters offboard mode after ~0.5s of valid commands
5. Arm → takeoff → race loop → land

7. Key Design Decisions

Vision-only depth

No GPS, LiDAR, or depth sensor. All distance estimation comes from PnP with 4 known gate corner positions in world coordinates. Gate dimensions (1.5 m x 1.5 m) are the only world-scale reference. This makes corner accuracy the single most important factor in the system.

U-Net as primary detector

Pixel-level segmentation mask enables RANSAC edge fitting along each gate side, then line-line intersection for corner extraction. This yields sub-pixel corner accuracy, which directly improves PnP reprojection error and therefore depth estimation. YOLO's bounding box corners are not precise enough for reliable PnP.

No RECOVERY phase

If the gate is lost (no detection for N frames), the state machine does not attempt to fly to a remembered position. Instead, it immediately enters SEEK and spins at 180 deg/s to re-acquire the gate visually. This is faster and more reliable than dead-reckoning to an estimated position without GPS.

Config-driven architecture

Every gain, threshold, and timeout lives in race_config.py as a YAML-serializable dataclass. No magic numbers in pipeline code. Config can be swapped at runtime for different platforms, gate layouts, or tuning experiments. Every race log includes the full config snapshot for reproducibility.

Aggressive tuning parameters

Competition-grade parameters — these exceed conservative defaults significantly.
ParameterValueContext
kp_yaw50Yaw proportional gain
cruise_pitch-25 degNose-down cruise angle
max_tilt70 degMonoRace uses 65+
seek_yaw_rate180 deg/sFull rotation in 2s

Predictive transit detection

Gate pass-through is not determined by a simple distance threshold (which can false-trigger on approach). Instead, the system computes distance derivative over 3+ consecutive frames. Only when the distance is decreasing, below threshold, and the derivative confirms closing velocity does it trigger TRANSIT. This eliminates false transitions from PnP noise at close range.

Architecture summary: Camera frame → U-Net mask → RANSAC corners → PnP depth → state machine → attitude command → MAVLink → PX4 → motors. Every component is a pure function of its inputs. The only mutable state is the gate tracker EMA and the state machine phase.