Pre-Simulator Training · Updated 2026-04-19

Two-track training plan
for VQ1 + VQ2.

VQ1 completion stack (deterministic, ships first) + VQ2 APEX PPO with a detector + telemetry observation. Rebuilt after the 2026-04-19 AIGP spec confirmed: no GPS, no absolute positioning, no depth. Inputs are FPV + telemetry; outputs are Throttle / Roll / Pitch / Yaw.

Track A
VQ1 completion — no learning
ships before May
Track B
VQ2 fastest time — APEX PPO
trains through June
Host
RTX 5080 · 16 GB · Windows
training + sim testing
Method
MonoRace + Swift hybrid
modular beats e2e
2026-04-19 update reshaped this plan. Previous "map + precompute NED trajectory + MPC track" approach is retired. Confirmed: no GPS, no absolute positioning, no depth. Inputs are FPV + telemetry; outputs are Throttle/Roll/Pitch/Yaw. Windows-only sim. Multiple parallel sim instances supported.

§ 01Target architecture

FPV + Telemetry
sim input
YOLO11n
detector
Keypoints
4 corners
PnP
gate pose
PID / PPO
VQ1 · VQ2
T / R / P / Y
sim output

References: MonoRace (A2RL 2025 champion, TU Delft MAVLab) · Swift (Nature 2023, UZH) · SkyDreamer (ICLR 2025).

§ 02Step 0 — Register + monitor

Register at dcl-project.com. Sim credentials and download link release "shortly before VQ1 launch" per the 2026-04-19 update. Expected May. Sign up for: Also review: Official Rules.

§ 03Step 1 — VQ1 completion stack (ships first)

Deterministic detector + PnP + PID. No learning on critical path. VQ1 scores on completion, not time — a reliable zero-learning stack is the safest submission.

class VQ1CompletionPilot:
    def step(self, frame_bgr, telemetry):
        # 1) detect gates
        detections = self.det.infer(frame_bgr)
        if not detections:
            return self._search_fallback(telemetry)

        # 2) pick target: closest in-front gate with highest confidence
        target = self._select_target(detections, telemetry)

        # 3) keypoints + PnP
        corners = self.kp.infer(frame_bgr, target.bbox)
        rvec, tvec = solve_pnp_square(GATE_3D_1p5m, corners, self.K)

        # 4) body-frame gate pose
        gate_body = rotate_by_quat(tvec, telemetry.attitude_q)

        # 5) PID controller
        heading_err = atan2(gate_body.y, gate_body.x)
        throttle = self._thrust_curve(|gate_body|, gate_body.z)
        return throttle, roll, pitch, yaw
Target: 100% gate-clear on the VQ1 course at conservative speed. Speed improvements come later via VQ2 PPO.

§ 04Step 2 — Swap PPO observation (required for VQ2)

Current Phase-3 PPO reads gate positions from course_map_test.json. That's not available in the real AIGP sim. Rebuild around signals the sim actually exposes:

# OLD (privileged — doesn't transfer):
obs = np.concat([
    drone_pos_ned,            # 3D · NOT AVAILABLE
    drone_vel_ned,            # 3D · NOT AVAILABLE
    gate_positions_ned[:2],   # 6D · NOT AVAILABLE
    ...
])

# NEW (detector + telemetry):
obs = np.concat([
    # Vision side (per frame)
    detected_gate_bbox_xywh,        # 4D · normalized [0,1]
    detected_gate_keypoints_xy,     # 8D · 4 corners
    detection_confidence,           # 1D
    frames_since_detection,         # 1D · decays
    # Telemetry side (from sim)
    attitude_quaternion,            # 4D
    body_rates,                     # 3D · roll/pitch/yaw rates
    body_accel,                     # 3D
    # History
    last_action,                    # 4D · T/R/P/Y
])  # Total: 28D
python train_apex.py policy                                    # default
python train_apex.py policy --observation-mode=privileged      # legacy (dev only)

§ 05Step 3 — Perception-aware reward (Swift's key insight)

Swift's single reward term teaches seek-attack behavior without a state machine. Adapted for detector-output obs: reward "gate near frame center at high confidence."

def perception_reward(detection):
    if detection is None:
        return -0.2                    # penalty for losing sight
    cx, cy = detection.bbox_center_normalized
    centered = 1.0 - 2.0 * np.hypot(cx - 0.5, cy - 0.5)
    return centered * detection.confidence

def step_reward(obs, detection, action, prev_action, crashed, gate_passed):
    r_progress   = 10.0 if gate_passed else 0.0
    r_perception = 0.5 * perception_reward(detection)
    r_smooth     = -0.1 * np.sum(np.abs(action - prev_action))
    r_crash      = -5.0 if crashed else 0.0
    return r_progress + r_perception + r_smooth + r_crash

§ 06Step 4 — Dataset expansion for VQ2

Current: 2,759 training images (dataset_gates_mega). Target: 10,000+ with domain randomization + hard negatives, because VQ2 adds lighting changes, 3D objects, and non-gated obstacles that can trigger false positives.

SourceCountMethodStatus
Existing real2,759Auto-labeled FPV framesDONE
Hard negatives~1,000generate_training_sets.pyDONE
Domain randomized~1,000generate_training_sets.pyDONE
Sim-day capture~6,000Slow laps on actual VQ1 simGATED ON SIM
Synthetic renders~2,000Blender / Three.js with VQ2 distractorsTODO

Albumentations pipeline stays unchanged (brightness, motion blur, occlusion, shadow).

§ 07Step 5 — Parallel sim instances for RL fan-out

AIGP update explicitly allows multiple sim instances on one box. Wrap the sim client in a SubprocVecEnv and run 8+ envs on the RTX 5080 host. Wall-clock should drop 6–8×.

from stable_baselines3.common.vec_env import SubprocVecEnv
from stable_baselines3 import PPO

def make_env(instance_id):
    def _init():
        return AIGPSimEnv(instance_id=instance_id, port=6000 + instance_id)
    return _init

env = SubprocVecEnv([make_env(i) for i in range(8)])
model = PPO("MlpPolicy", env, n_steps=2048, batch_size=512, ...)
model.learn(total_timesteps=10_000_000)

Caveat: active internet required per instance (anti-cheat). Budget bandwidth + make sure the sim's handshake supports parallel clients (confirmed in update, details TBD).

§ 08Retired (pre-2026-04-19)

MPC tracker + course mapper

RETIRED

Built on the assumption that the sim exposes ODOMETRY in NED. VADR-TS-002 confirmed it doesn't — ODOMETRY was removed from the supported MAVLink message set in the 2026-05-08 spec revision. As of 2026-05-12 the ODOMETRY subscription is deleted from mavsdk_bridge.py; an InertialIntegrator dead-reckons position from HIGHRES_IMU + ATTITUDE with gate-PnP corrections. See winning-strategy.html retired-components section.

MobileNetV3-Small custom keypoint model

RETIRED

YOLO11n-pose (APEX Phase 2) already produces 4 gate corners at better accuracy and integrates in one pass. A second custom model is redundant.

Vision-SLAM branch (LingBot-Map et al.)

RETIRED

Feed-forward monocular SLAM does not work at racing velocities — see our 2026-04-17 investigation. Removed from the repo.

§ 09VQ1 sim day-one checklist

PriorityTaskWhyTime
P0Telemetry adapterMap sim payload → our obs format~1 hr
P0VQ1 completion pilot shakedown100% gate-clear at conservative speed~2 hr tune
P0Sim-frame dataset capture10 min slow laps for detector fine-tune~15 min + retrain
P0Submit VQ1 entryLock in a passing submission~30 min
P1SubprocVecEnv wrapper8-instance parallel for PPO training~3 hr
P1Swap PPO observationDetector + telemetry instead of privileged~4 hr
P1Run Phase 3 PPO on captured sim framesVQ2 prepovernight
P2RF-DETR retrainOnly if YOLO recall limits VQ2~2 hr

§ 10What separates winners from losers

Winners

SHIP
  • Ship a boring deterministic stack for VQ1 on day one. Lock the submission.
  • Build PPO observation around signals the sim actually exposes.
  • Use parallel sim instances for RL fan-out — competition explicitly allows it.
  • Perception-aware reward: keep the gate in the frame.
  • Robust to detection loss: short-horizon telemetry dead-reckoning fallback.

Losers

AVOID
  • Rely on GPS / NED / absolute positioning (confirmed not available).
  • Train PPO against privileged gate positions and only notice at submission time.
  • Spend VQ1 window training a half-trained end-to-end policy that can't complete.
  • Forget that the sim is Windows-only.
  • Skip parallel instances and train PPO on a single env.

§ 11Key performance targets

MetricVQ1 targetVQ2 targetCurrent
Gate detection mAP@50>95%>99%97.9% (RF-DETR) · YOLO11n TBM
Detection latency<10ms<5ms~8ms (RF-DETR PT)
PnP depth error @ 5m<15%<5%~15% (single-frame)
Course completion rate100%>95%TBD on sim
Lap time vs MonoRacewithin 20%TBD
End-to-end latency<50ms<30ms~80ms estimated
PPO training steps10M+ (parallel envs)2M single-env

§ 12Current codebase state

ComponentFileStatus
APEX detector trainingtrain_apex.py detectorCOMPLETE
APEX keypoint trainingtrain_apex.py keypointsCOMPLETE
APEX PPO trainingtrain_apex.py policyNeeds observation swap
RL environmentrl_controller.pyNeeds observation swap
YOLO trainingyolo-train.pyCOMPLETE
RF-DETR trainingtrain_models.pyCOMPLETE · 97.9% mAP
Gate segmentationgate_segmentation.pyCOMPLETE
IGPP EKFimu_gate_predictor.pyCOMPLETE · short-horizon fallback
SAMD multi-frame depthsynthetic_aperture_depth.pyCOMPLETE
6DOF SimDronesim_drone.pyCOMPLETE (dev proxy)
Model evaluationevaluate_models.pyCOMPLETE
VQ1 completion pilotvq1_completion_pilot.pyStub · needs sim interface
MPC tracker + course mappermpc_tracker.py, course_mapper.pyRETIRED · NED-dependent
TRAINING-PLAN · v2.0 2026-04-19 · ← Index · Strategy · APEX