FPV visual stream + telemetry → YOLO11n detector → 4-corner keypoints → PnP → controller (PID for VQ1, perception-aware PPO for VQ2) → Throttle / Roll / Pitch / Yaw. No GPS, no absolute positioning, no depth — per the confirmed 2026-04-19 AIGP spec. Every component is a pure function of its inputs; the only mutable state is the detection history and the PPO policy network.
A single FPV camera + telemetry are the only inputs. No GPS, no LiDAR, no depth. All spatial awareness is derived from vision — gate detection in the camera frame → PnP for gate-relative 3D pose → controller. Telemetry (attitude quaternion, body rates, accel) is used for IMU-primary short-horizon state estimation and as a first-class PPO observation component.
Two controller options share the same perception frontend:
YOLO11n → YOLO11n-pose → PnP → target-gate tracker → PID (heading, altitude) + feed-forward throttle.
Same detection frontend + perception-aware PPO [256,256,256] over (detector output + telemetry).
The pipeline runs as a single async Python process. Every frame triggers the full chain — detect, estimate, decide, command. Target end-to-end latency: under 50 ms per frame (MonoRace achieved 22 ms; Swift 40 ms).
| Stage | Latency (target) |
|---|---|
| Detector (YOLO11n) | ~5 ms |
| Keypoints (YOLO11n-pose) | ~3 ms |
| PnP (SOLVEPNP_IPPE_SQUARE) | ~0.5 ms |
| Target-gate tracker | ~0.1 ms |
| Controller (PID or PPO inference) | ~0.2 ms |
| Transport (sim I/O) | ~1 ms |
| Total (target) | <10 ms |
| File | Role | Description |
|---|---|---|
vq1_completion_pilot.py | CORE | Zero-learning VQ1 pilot: YOLO + 4-corner PnP + PID. Stubbed sim adapter. |
race_pipeline.py | CORE | Main orchestrator. Async race loop, target-gate tracker, lap counter. |
vision_pipeline.py | VISION | VisionPipeline class. Dispatches detector backend. Runs PnP on detected corners. |
gate_segmentation.py | VISION | U-Net + RANSAC alternative detector. Sub-pixel corner extraction. |
train_apex.py | TRAINING | 3-phase APEX: detector → keypoints → PPO. --observation-mode flag for VQ2 obs swap. |
rl_controller.py | CONTROL | Gym env + PPO NeuralController. VQ2 runtime. |
imu_gate_predictor.py | VISION | IGPP EKF: short-horizon gate-pose prediction when detection drops. |
synthetic_aperture_depth.py | VISION | SAMD multi-frame depth refinement. |
race_config.py | CONFIG | RaceConfig dataclass. Gains, thresholds, timeouts, intrinsics. YAML serializable. |
camera_adapter.py | INPUT | Camera abstraction. Sources: sim pipe, video file, synthetic. |
race_logger.py | INFRA | JSONL per-frame telemetry logging. |
sim_drone.py | DEV | 6DOF physics proxy for local iteration. Not the real AIGP sim. |
dashboard_server.py | INFRA | WebSocket telemetry → browser dashboard. |
mpc_tracker.py, course_mapper.py | RETIRED | NED-dependent. Kept for reference. |
Three interchangeable detector backends, selected in race_config.py. Trade-offs between speed, robustness, and corner accuracy.
| Mode | Method | Latency | Trade-off |
|---|---|---|---|
| YOLO11n + pose | YOLO11n detector + YOLO11n-pose 4 corners | ~5 ms GPU | Ships for VQ1 + VQ2. Already trained via APEX. |
| RF-DETR-Nano | DINOv2 backbone · deformable decoder | ~2.3 ms TRT | VQ2 upgrade if YOLO recall is bottleneck. |
| U-Net + RANSAC | Pixel segmentation · RANSAC edge fitting | ~5 ms GPU | Sub-pixel corners alternative. Best PnP accuracy. |
| Color (HSV) | Threshold + contour | <1 ms CPU | Fallback for highlighted gates. Brittle under lighting. |
6DOF Python physics. Local UDP. Host GPU. For APEX development before the real sim lands.
Downloadable package, Windows only. Internet required (anti-cheat). Parallel instances supported. Released shortly before VQ1.
Serial MAVLink. RPi Camera 3 Wide. Jetson Orin Nano 8GB. For real-world testing before physical qualifier.
Provided at competition venue. Can't purchase or modify. Camera + MAVLink endpoints specified before physical qualifier. Plan for runtime YAML config.
| Parameter | Value / Status |
|---|---|
| Protocol | Likely MAVLink v2 over UDP (TBD confirmation at sim release) |
| Transport | Local UDP to sim · internet required for anti-cheat handshake |
| Command rate | TBD (matches sim's input handler; aim for per-frame) |
| Telemetry rate | TBD (likely streamed continuously alongside frames) |
| Parallel instances | Supported — wire SubprocVecEnv for PPO training |
# Expected control flow (confirmed against sim package)
1. Launch local sim instance (Windows only)
2. Anti-cheat handshake over internet
3. Stream (frame, telemetry) -> our Python AI
4. AI emits (throttle, roll, pitch, yaw) per frame
5. Run until gates cleared (VQ1) or 8-minute timeout
No GPS, LiDAR, or depth sensor. All distance estimation comes from PnP with 4 known gate corners in world coordinates. Gate dimensions (1.5 m × 1.5 m) are the only world-scale reference. Corner accuracy is the single most important factor.
YOLO11n-pose outputs 4 gate corners per detection in one pass. Better than bounding-box-only YOLO because PnP needs actual corners, not axis-aligned rectangles. Avoids a second custom keypoint model.
No absolute positioning means we can't fly to a remembered gate position after losing sight. Instead, the pilot enters a slow forward + yaw sweep to re-acquire visually. Faster and more reliable than dead-reckoning without GPS.
Every gain, threshold, and timeout lives in race_config.py as a YAML-serializable dataclass. No magic numbers in pipeline code. Config snapshot saved in every race log for reproducibility.
| Parameter | Value | Context |
|---|---|---|
kp_yaw | 50 | Yaw proportional gain |
cruise_pitch | −25° | Nose-down cruise |
max_tilt | 70° | MonoRace uses 65+ |
seek_yaw_rate | 180°/s | Full rotation in 2s |
Gate pass-through is not a simple distance threshold (can false-trigger on approach). Computes distance derivative over 3+ consecutive frames. Only when distance is decreasing, below threshold, and the derivative confirms closing velocity does it trigger TRANSIT. Eliminates false transitions from PnP noise at close range.