Pre-Simulator Training · Updated 2026-04-19

Two-track training plan
for VQ1 + VQ2.

VQ1 completion stack (deterministic, ships first) + VQ2 APEX PPO with a detector + telemetry observation. Rebuilt after the 2026-04-19 AIGP spec confirmed: no GPS, no absolute positioning, no depth. Inputs are FPV + telemetry; outputs are Throttle / Roll / Pitch / Yaw.

Track A

VQ1 completion — no learning

ships before May

Track B

VQ2 fastest time — APEX PPO

trains through June

Host

RTX 5080 · 16 GB · Windows

training + sim testing

Method

MonoRace + Swift hybrid

modular beats e2e

2026-04-19 update reshaped this plan. Previous "map + precompute NED trajectory + MPC track" approach is retired. Confirmed: no GPS, no absolute positioning, no depth. Inputs are FPV + telemetry; outputs are Throttle/Roll/Pitch/Yaw. Windows-only sim. Multiple parallel sim instances supported.

§ 01Target architecture

FPV + Telemetry

sim input

YOLO11n

detector

Keypoints

4 corners

PnP

gate pose

PID / PPO

VQ1 · VQ2

T / R / P / Y

sim output

References: MonoRace (A2RL 2025 champion, TU Delft MAVLab) · Swift (Nature 2023, UZH) · SkyDreamer (ICLR 2025).

§ 02Step 0 — Register + monitor

Register at dcl-project.com. Sim credentials and download link release "shortly before VQ1 launch" per the 2026-04-19 update. Expected May. Sign up for:

Team account + compliance forms
Interface specification updates (telemetry payload format, camera params)
Simulator download when released
Forum monitoring for clarifications

Also review: Official Rules.

§ 03Step 1 — VQ1 completion stack (ships first)

Deterministic detector + PnP + PID. No learning on critical path. VQ1 scores on completion, not time — a reliable zero-learning stack is the safest submission.

class VQ1CompletionPilot:
    def step(self, frame_bgr, telemetry):
        # 1) detect gates
        detections = self.det.infer(frame_bgr)
        if not detections:
            return self._search_fallback(telemetry)

        # 2) pick target: closest in-front gate with highest confidence
        target = self._select_target(detections, telemetry)

        # 3) keypoints + PnP
        corners = self.kp.infer(frame_bgr, target.bbox)
        rvec, tvec = solve_pnp_square(GATE_3D_1p5m, corners, self.K)

        # 4) body-frame gate pose
        gate_body = rotate_by_quat(tvec, telemetry.attitude_q)

        # 5) PID controller
        heading_err = atan2(gate_body.y, gate_body.x)
        throttle = self._thrust_curve(|gate_body|, gate_body.z)
        return throttle, roll, pitch, yaw

Target: 100% gate-clear on the VQ1 course at conservative speed. Speed improvements come later via VQ2 PPO.

§ 04Step 2 — Swap PPO observation (required for VQ2)

Current Phase-3 PPO reads gate positions from course_map_test.json. That's not available in the real AIGP sim. Rebuild around signals the sim actually exposes:

# OLD (privileged — doesn't transfer):
obs = np.concat([
    drone_pos_ned,            # 3D · NOT AVAILABLE
    drone_vel_ned,            # 3D · NOT AVAILABLE
    gate_positions_ned[:2],   # 6D · NOT AVAILABLE
    ...
])

# NEW (detector + telemetry):
obs = np.concat([
    # Vision side (per frame)
    detected_gate_bbox_xywh,        # 4D · normalized [0,1]
    detected_gate_keypoints_xy,     # 8D · 4 corners
    detection_confidence,           # 1D
    frames_since_detection,         # 1D · decays
    # Telemetry side (from sim)
    attitude_quaternion,            # 4D
    body_rates,                     # 3D · roll/pitch/yaw rates
    body_accel,                     # 3D
    # History
    last_action,                    # 4D · T/R/P/Y
])  # Total: 28D

python train_apex.py policy                                    # default
python train_apex.py policy --observation-mode=privileged      # legacy (dev only)

§ 05Step 3 — Perception-aware reward (Swift's key insight)

Swift's single reward term teaches seek-attack behavior without a state machine. Adapted for detector-output obs: reward "gate near frame center at high confidence."

def perception_reward(detection):
    if detection is None:
        return -0.2                    # penalty for losing sight
    cx, cy = detection.bbox_center_normalized
    centered = 1.0 - 2.0 * np.hypot(cx - 0.5, cy - 0.5)
    return centered * detection.confidence

def step_reward(obs, detection, action, prev_action, crashed, gate_passed):
    r_progress   = 10.0 if gate_passed else 0.0
    r_perception = 0.5 * perception_reward(detection)
    r_smooth     = -0.1 * np.sum(np.abs(action - prev_action))
    r_crash      = -5.0 if crashed else 0.0
    return r_progress + r_perception + r_smooth + r_crash

§ 06Step 4 — Dataset expansion for VQ2

Current: 2,759 training images (dataset_gates_mega). Target: 10,000+ with domain randomization + hard negatives, because VQ2 adds lighting changes, 3D objects, and non-gated obstacles that can trigger false positives.

Source	Count	Method	Status
Existing real	2,759	Auto-labeled FPV frames	DONE
Hard negatives	~1,000	generate_training_sets.py	DONE
Domain randomized	~1,000	generate_training_sets.py	DONE
Sim-day capture	~6,000	Slow laps on actual VQ1 sim	GATED ON SIM
Synthetic renders	~2,000	Blender / Three.js with VQ2 distractors	TODO

Albumentations pipeline stays unchanged (brightness, motion blur, occlusion, shadow).

§ 07Step 5 — Parallel sim instances for RL fan-out

AIGP update explicitly allows multiple sim instances on one box. Wrap the sim client in a SubprocVecEnv and run 8+ envs on the RTX 5080 host. Wall-clock should drop 6–8×.

from stable_baselines3.common.vec_env import SubprocVecEnv
from stable_baselines3 import PPO

def make_env(instance_id):
    def _init():
        return AIGPSimEnv(instance_id=instance_id, port=6000 + instance_id)
    return _init

env = SubprocVecEnv([make_env(i) for i in range(8)])
model = PPO("MlpPolicy", env, n_steps=2048, batch_size=512, ...)
model.learn(total_timesteps=10_000_000)

Caveat: active internet required per instance (anti-cheat). Budget bandwidth + make sure the sim's handshake supports parallel clients (confirmed in update, details TBD).

§ 08Retired (pre-2026-04-19)

MPC tracker + course mapper

RETIRED

Built on the assumption that the sim exposes ODOMETRY in NED. VADR-TS-002 confirmed it doesn't — ODOMETRY was removed from the supported MAVLink message set in the 2026-05-08 spec revision. As of 2026-05-12 the ODOMETRY subscription is deleted from mavsdk_bridge.py; an InertialIntegrator dead-reckons position from HIGHRES_IMU + ATTITUDE with gate-PnP corrections. See winning-strategy.html retired-components section.

MobileNetV3-Small custom keypoint model

RETIRED

YOLO11n-pose (APEX Phase 2) already produces 4 gate corners at better accuracy and integrates in one pass. A second custom model is redundant.

Vision-SLAM branch (LingBot-Map et al.)

RETIRED

Feed-forward monocular SLAM does not work at racing velocities — see our 2026-04-17 investigation. Removed from the repo.

§ 09VQ1 sim day-one checklist

Priority	Task	Why	Time
P0	Telemetry adapter	Map sim payload → our obs format	~1 hr
P0	VQ1 completion pilot shakedown	100% gate-clear at conservative speed	~2 hr tune
P0	Sim-frame dataset capture	10 min slow laps for detector fine-tune	~15 min + retrain
P0	Submit VQ1 entry	Lock in a passing submission	~30 min
P1	SubprocVecEnv wrapper	8-instance parallel for PPO training	~3 hr
P1	Swap PPO observation	Detector + telemetry instead of privileged	~4 hr
P1	Run Phase 3 PPO on captured sim frames	VQ2 prep	overnight
P2	RF-DETR retrain	Only if YOLO recall limits VQ2	~2 hr

§ 10What separates winners from losers

Winners

SHIP

Ship a boring deterministic stack for VQ1 on day one. Lock the submission.
Build PPO observation around signals the sim actually exposes.
Use parallel sim instances for RL fan-out — competition explicitly allows it.
Perception-aware reward: keep the gate in the frame.
Robust to detection loss: short-horizon telemetry dead-reckoning fallback.

Losers

AVOID

Rely on GPS / NED / absolute positioning (confirmed not available).
Train PPO against privileged gate positions and only notice at submission time.
Spend VQ1 window training a half-trained end-to-end policy that can't complete.
Forget that the sim is Windows-only.
Skip parallel instances and train PPO on a single env.

§ 11Key performance targets

Metric	VQ1 target	VQ2 target	Current
Gate detection mAP@50	>95%	>99%	97.9% (RF-DETR) · YOLO11n TBM
Detection latency	<10ms	<5ms	~8ms (RF-DETR PT)
PnP depth error @ 5m	<15%	<5%	~15% (single-frame)
Course completion rate	100%	>95%	TBD on sim
Lap time vs MonoRace	—	within 20%	TBD
End-to-end latency	<50ms	<30ms	~80ms estimated
PPO training steps	—	10M+ (parallel envs)	2M single-env

§ 12Current codebase state

Component	File	Status
APEX detector training	`train_apex.py detector`	COMPLETE
APEX keypoint training	`train_apex.py keypoints`	COMPLETE
APEX PPO training	`train_apex.py policy`	Needs observation swap
RL environment	`rl_controller.py`	Needs observation swap
YOLO training	`yolo-train.py`	COMPLETE
RF-DETR training	`train_models.py`	COMPLETE · 97.9% mAP
Gate segmentation	`gate_segmentation.py`	COMPLETE
IGPP EKF	`imu_gate_predictor.py`	COMPLETE · short-horizon fallback
SAMD multi-frame depth	`synthetic_aperture_depth.py`	COMPLETE
6DOF SimDrone	`sim_drone.py`	COMPLETE (dev proxy)
Model evaluation	`evaluate_models.py`	COMPLETE
VQ1 completion pilot	`vq1_completion_pilot.py`	Stub · needs sim interface
MPC tracker + course mapper	`mpc_tracker.py, course_mapper.py`	RETIRED · NED-dependent

TRAINING-PLAN · v2.0 2026-04-19 · ← Index · Strategy · APEX

Two-track training planfor VQ1 + VQ2.

§ 01Target architecture

§ 02Step 0 — Register + monitor

§ 03Step 1 — VQ1 completion stack (ships first)

§ 04Step 2 — Swap PPO observation (required for VQ2)

§ 05Step 3 — Perception-aware reward (Swift's key insight)

§ 06Step 4 — Dataset expansion for VQ2

§ 07Step 5 — Parallel sim instances for RL fan-out

§ 08Retired (pre-2026-04-19)

MPC tracker + course mapper

MobileNetV3-Small custom keypoint model

Vision-SLAM branch (LingBot-Map et al.)

§ 09VQ1 sim day-one checklist

§ 10What separates winners from losers

Winners

Losers

§ 11Key performance targets

§ 12Current codebase state

Two-track training plan
for VQ1 + VQ2.