Most teams will train on web-scraped racing footage with noisy labels and arrive at the AIGP simulator hoping their detector generalizes. We don't hope. We render the truth, label it perfectly, train against it under physics-correct constraints, and ship a stack that's mathematically tuned to the AIGP spec.
This page is the technical case for why our team finishes first. Each section is one innovation: a graphic, the math behind it, and what it buys us in race time.
"Don't bring photos to a math fight." — what every other team is about to learn.
Every other approach starts with images and tries to label them — annotators draw boxes, mistakes creep in, label noise caps your model's mAP. We invert it. We start from a 3D pose and project. The corners are the label, derived from the pinhole camera matrix. There's nothing to mislabel.
The camera intrinsics fall out of the AIGP spec. Given $W$ pixels wide and FOV $\theta_h$:
$$f_x = \frac{W/2}{\tan(\theta_h / 2)}, \qquad c_x = \frac{W}{2}.$$Project four 3D corners $\mathbf{P}_i = R \mathbf{X}_i + \mathbf{t}$ through this matrix and you have four exact pixel coordinates. The bbox is their axis-aligned hull. The keypoints are the corners themselves. No annotator. No noise floor.
An LED isn't a colored line — it's a saturated emitter wrapped in a halo. The eye reads it that way because the emitter saturates the sensor while the surrounding diffraction tail carries the color. We model this with a five-stage additive composite. The math is just point-spread functions stacked at three radii.
A 2D Gaussian point-spread function:
$$G(r;\sigma) = \frac{1}{2\pi\sigma^2}\, e^{-r^2 / 2\sigma^2}.$$The full bloom is a weighted sum of three blurred copies of the LED line $L$, plus the matte frame $F$ and white-core $K$:
$$I_\text{out} = F + 0.55\,(L * G_\text{big}) + 0.95\,(L * G_\text{med}) + 1.4\,(L * G_\text{small}) + K$$The $\sigma$ values scale inversely with gate distance — close gates get a tighter halo; far gates a wider one. Real diffraction works the same way.
Uniform distance sampling wastes capacity. A drone at 40 m doesn't care about the gate yet — it cares at 4–15 m, where one bad detection means a crash. We use a $\mathrm{Beta}(2, 4)$ to skew the distance distribution toward the band that matters.
The Beta PDF concentrates probability:
$$f(u;\alpha,\beta)= \frac{u^{\alpha-1}(1-u)^{\beta-1}}{B(\alpha,\beta)}.$$For $\alpha{=}2,\, \beta{=}4$, the mode is at $u^* = \frac{\alpha-1}{\alpha+\beta-2} = \tfrac{1}{4}$. We map $d = 3 + 22u$ so the mode lands at 8 m — exactly where a 15 m/s racer needs ~530 ms to react.
Most of our 10K-image budget therefore lives at the distance where wrong calls cost laps.
VADR-TS-002 §3.8 specifies a forward-facing first-person camera with exact pinhole intrinsics — no distortion, no FoV inference. We pin those numbers into both the renderer and the policy environment, so there is no train/eval mismatch. Same $(f_x, f_y, c_x, c_y)$ everywhere. The camera is mounted with a 20° upward pitch from the body — a deliberate design choice that biases gate visibility upward in the frame.
The intrinsics matrix from VADR-TS-002 §3.8:
$$K = \begin{pmatrix} 320 & 0 & 320 \\ 0 & 320 & 180 \\ 0 & 0 & 1 \end{pmatrix}.$$Both synth_aigp_gates.py and train_apex.py pin these exact values — no FoV-to-fx conversion in the pipeline. The spec also lists "VFoV = 90°" in prose, but the numerics give VFoV ≈ 58.7° (HFoV = 90°). We trust the intrinsics; clarification pending from the organizer.
This is structural sim2real: no domain randomization needed for the camera, because the camera doesn't need bridging.
Swift (Nature 2023) won by adding a single perception term to the reward: $\cos\theta$ where $\theta$ is the angle from camera boresight to the gate. Look toward the gate, get reward; look away, lose reward. The policy learns the seek-attack cycle without anyone designing it. We use the same trick, with our own scaling.
The full reward at each step:
$$r_t = 2\Delta d \;+\; 100\,(1+v/15)\,\mathbb{1}_\text{pass} \;+\; 0.3\cos\theta \;+\; 0.15(v/25)\mathbb{1}_\text{vis} \;-\; 0.02\|a_t-a_{t-1}\|_1.$$Term-by-term: progress, gate-pass with speed bonus, perception, speed-when-visible, action smoothness. The $\cos\theta$ term alone induces yaw scanning when no gate is in frame and lock-in attack behavior when one is.
The AIGP layout isn't published yet. We don't need it. We train across course shapes that span the racing skill space: chicane (lateral lines), vertical dive (altitude), hairpin (180° turns), sprint-split (high-speed entry), climb-descend (sustained altitude trajectory), 3D figure-8 (cross-overs). One policy, every skill.
Every champion paper agrees on the architecture: detector → keypoints → PnP → state estimator → policy → actuators. We run all six in <20 ms on RTX 5080, leaving a fat margin under the 50 Hz (20 ms) target. Detector alone is <7 ms.
A race is $N$ gates in series. If your per-gate success rate is $p$, your full-race success is $p^N$. At $N=14$ gates, the difference between $p=0.95$ and $p=0.99$ is the difference between finishing 49% of attempts and 87%. We over-engineer the perception stack to push $p$ as close to 1 as physics allows.
Race finish probability:
$$P_\text{finish} = p^N$$Each percentage point of $p$ matters geometrically. Driving $p$ from 0.95 to 0.99 is a 1.78× finish-rate improvement at $N{=}14$. From 0.99 to 0.999 is another 1.14×.
This is why we spend the engineering on:
Perfect labels feed a detector with a higher mAP ceiling. Higher mAP feeds a keypoint head with cleaner corners. Cleaner corners feed PnP with lower pose error. Lower pose error feeds the policy with stable observations. Stable observations train a policy with sharper $\cos\theta$ alignment. Sharper alignment means tighter racing lines, fewer crashes, and a finish probability that compounds across all $N$ gates.
Other teams will rely on data luck. We rely on geometry, statistics, and a renderer we wrote ourselves.
Cross-refs:
synthetic dataset ·
APEX pipeline ·
winning playbook ·
winning strategy ·
training runbook.
Code: synth_aigp_gates.py, train_apex.py, aigp_courses.py, apex_progress_ui.py.