Spec gap · VQ1 + VQ2 requirement

Obstacle detection.

The AIGP tech spec explicitly lists "vertical and horizontal obstacles, boundary elements, terrain and environmental structures" as scene elements. Our current stack only sees gates. This doc lays out the three-tier plan to close that gap — detect, avoid, learn — plus the synthetic-data pipeline using race-r3f.html as a free labelled-data generator.

Current state

No obstacle handling at all

detector is nc=1 (gate only)

Target

nc=2 detector + avoidance layer

gate + obstacle classes

Data source

race-r3f.html → Playwright render

auto-labeled synthetic

Timeline

2-3 days to Tier 1 + 2

Tier 3 after sim release

The risk if we skip this: VQ2 places the drone in 3D-scanned realistic environments with industrial props, pipes, scaffolding, pillars. A detector that only sees gates will happily fly the drone into a concrete wall while chasing a gate behind it. Every gate-passage reward means nothing if the drone doesn't finish.

§ 01What the spec says

From the official technical spec (260318_Technical_Spec_0001.pdf, § 3.1):

The race takes place within a high-fidelity real-time physics simulator:
  • start gate
  • sequential race gates
  • finish gate
  • vertical and horizontal obstacles
  • boundary elements
  • terrain and environmental structures

And from Round-1 / Round-2 preview notes (your submission guide):

VQ1 — "training hall" with few gates and placed obstacles you must fly around
VQ2 — "industrial complex" with dense procedural obstacles: containers, pipes, walls, boundary nets

§ 02What we currently do (nothing)

Layer	Obstacle handling
Detector (`apex_yolo11n`)	nc=1 · sees only `gate`
Keypoint model	Gate corners only
Vision pipeline	Binary: "gate" / "not-gate". Not-gate is discarded.
Policy (PPO)	Trained on empty oval courses — never saw an obstacle
SimDrone physics	No obstacle collision geometry in the training env
Hard-neg mining	Suppresses false-positive detections on obstacles — but does nothing to avoid them.

§ 03Three-tier plan

Tier 1 — "See obstacles" 2-3 days

Extend the detector to nc=2: gate and obstacle.

Generate synthetic labelled data from race-r3f.html (see § 04 below)
Retrain YOLO11n on dataset_gates_obstacles/, 200 epochs
Validate: mAP50 > 0.85 on both classes, < 3% cross-class confusion
Ship to submission pipeline: replace models/apex_yolo11n.pt

Tier 2 — "Avoid on sight" 1-2 days after Tier 1

Add a proximity-guard layer in the control pipeline that runs before the policy:

If any obstacle bbox covers > 30% of the frame → emergency pitch-up
If obstacle bbox center is inside the drone-heading cone within est. 5 m → roll away
Simple rule-based MPC-lite. Adds maybe 5 ms to the 50 ms pipeline budget.
Decouples safety from the policy — even a degraded policy won't crash into visible obstacles.

Tier 3 — "Learn to fly through" 4-6 hr training

Retrain PPO with obstacles in the training env:

Procedural obstacles placed in ApexDroneEnv training courses
Collision geometry in SimDrone (AABB check per step)
New reward terms: −10 × hit_obstacle, +0.1 × clearance_meters
Obstacle bbox added to the 28D observation (grows to ~32D)
Expect convergence in 2-3 million additional PPO steps.

For VQ1: Tier 1 + 2 alone should suffice — VQ1 scores completion, not speed (per training plan). Defensive flying is fine.
For VQ2: Tier 3 matters — you need the policy to weave through obstacles at race pace, not just avoid them defensively.

§ 04Synthetic data pipeline — `race-r3f.html` + Playwright

The win here: we already built race-r3f.html as a Three.js visualisation of VQ1 and VQ2 scenes. It knows every gate's 3D position and every obstacle's geometry. We can drive it headlessly, render random camera poses, and compute perfect YOLO bboxes by projecting the known scene graph into screen space. Zero manual labeling.

Architecture

  ┌──────────────────────────────────┐       ┌───────────────────────────┐
  │  Playwright (Python)             │       │  Chromium (headless)      │
  │  render_synthetic_dataset.py     │─────▶ │  race-r3f.html            │
  │    • sample random camera pose   │       │    ?mode=dataset&preset=… │
  │    • call __AIGP_CAPTURE__(pose) │       │    • tag gates/obstacles  │
  │    • decode base64 JPEG          │ ◀─────│      userData.aigpClass   │
  │    • write image + .txt label    │       │    • project 3D→2D bbox   │
  │                                  │       │    • toDataURL('jpeg')    │
  └──────────────────────────────────┘       └───────────────────────────┘
                                                         │
                                                         ▼
                                          dataset_gates_obstacles/
                                            images/{train,val}/*.jpg
                                            labels/{train,val}/*.txt
                                            data.yaml (nc=2)

Usage

# Render 1000 VQ2 frames (industrial complex scene with obstacles)
./aigp/Scripts/python.exe render_synthetic_dataset.py --preset vq2 --num 1000

# VQ1 — more gates, no obstacles (mostly true-negative obstacle training)
./aigp/Scripts/python.exe render_synthetic_dataset.py --preset vq1 --num 500

# Check counts + class balance
./aigp/Scripts/python.exe render_synthetic_dataset.py stats

How `race-r3f.html` exposes the data

When loaded with ?mode=dataset, the page:

Disables the auto-orbit camera and hides the drone
Tags every gate mesh with userData.aigpClass = 'gate' and every obstacle with 'obstacle'
Enables preserveDrawingBuffer on the WebGL context (required for toDataURL)
Registers window.__AIGP_CAPTURE__(pose) which:
- Moves the camera to the requested pose
- Forces a synchronous render
- Walks the scene graph and projects every tagged object to 2D screen-space AABB
- Returns {image: dataURL, labels: [{cls, cx, cy, w, h}], width, height}
Sets window.__AIGP_READY__ = true so Playwright can poll

Interactive mode (no ?mode=dataset) is completely unchanged — dataset hooks are additive and only activate via the query param.

Camera-pose sampling

render_synthetic_dataset.py samples realistic FPV poses:

Camera position inside a cylinder of radius R × 0.9, height 0.8-5.5 m (typical race altitudes)
Look-at point biased toward the gate ring where targets are
FOV sampled from {55, 70, 85, 100, 120}° to cover typical FPV lens distortion
Deterministic via --seed for reproducible datasets

§ 05Training workflow

1 · Generate synthetic data

./aigp/Scripts/python.exe render_synthetic_dataset.py --preset vq2 --num 2000
./aigp/Scripts/python.exe render_synthetic_dataset.py --preset vq1 --num 1000
./aigp/Scripts/python.exe render_synthetic_dataset.py stats

Target: 20-40% obstacle-box fraction in stats output. If too low, crank the VQ2 preset up or add more synthetic obstacle configurations to race-r3f.html.

2 · Mix with positive gate data

Synthetic-only is risky — your detector could overfit to the Three.js rendering style. Mix with real sources:

# After capturing DCL gameplay (see dcl-capture-guide.html),
# create a combined data.yaml pointing at multiple roots, or use YOLO's
# list-of-dirs support. Example data.yaml:
path: C:/Users/pc/Downloads/grandprix-latest
train:
  - dataset_gates_mega/images/train
  - dataset_gates_dcl/images/train
  - dataset_gates_obstacles/images/train
val:
  - dataset_gates_obstacles/images/val
nc: 2
names: [gate, obstacle]

3 · Retrain with nc=2

./aigp/Scripts/python.exe train_apex.py detector \
  --dataset dataset_gates_obstacles --epochs 200

Start from COCO pretrained (default) — don't try to fine-tune the existing nc=1 gate model into nc=2, it'll miserably forget the gate class.

4 · Validate the two-class detector

Extend benchmark_models.py to report per-class metrics. Pass criteria:

Gate mAP50 > 0.90 (must not regress vs current 97.9% nc=1 baseline)
Obstacle mAP50 > 0.80 (synthetic domain gap is real, this is realistic)
Cross-class confusion < 3% (detector shouldn't call obstacles "gates" or vice-versa)

§ 06Pitfalls

Pitfall	Symptom	Fix
Synthetic-only overfit	mAP50 drops 30% on real DCL frames	Mix real DCL data at 40%+ of training set
Obstacle bboxes too loose	AABB of rotated pipe covers empty space	Switch to tighter oriented bbox (OBB) or filter by visible-pixel count
Behind-camera projection	Labels with impossibly-large boxes	Already filtered in `_projectToScreen` — rejects corners with `v.z > 1`
Off-screen false labels	Tiny boxes at edges	Already filtered: boxes < 4px or >80% off-screen are dropped
Catastrophic forgetting	After nc=2 training, gate recall tanks	Don't fine-tune nc=1 → nc=2. Start from COCO pretrained or train from scratch.
Class imbalance	Many more gate boxes than obstacle	Use YOLO's `class_weights` or oversample obstacle-heavy frames
Playwright slow	5 FPS capture rate	`toDataURL` is the bottleneck. 1000 frames = ~3 min. Acceptable.

§ 07Related

Training Plan — where obstacle detection fits in the VQ1/VQ2 roadmap
DCL Gameplay Capture — complementary data source (real images with obstacles in the wild)
Hard-Negative Mining — different problem: suppressing false gate detections on gate-lookalikes. Use both.
Training Runbook — exact commands once data is generated

Obstacle detection.

§ 01What the spec says

§ 02What we currently do (nothing)

§ 03Three-tier plan

Tier 1 — "See obstacles" 2-3 days

Tier 2 — "Avoid on sight" 1-2 days after Tier 1

Tier 3 — "Learn to fly through" 4-6 hr training

§ 04Synthetic data pipeline — race-r3f.html + Playwright

Architecture

Usage

How race-r3f.html exposes the data

Camera-pose sampling

§ 05Training workflow

1 · Generate synthetic data

2 · Mix with positive gate data

3 · Retrain with nc=2

4 · Validate the two-class detector

§ 06Pitfalls

§ 07Related

§ 04Synthetic data pipeline — `race-r3f.html` + Playwright

How `race-r3f.html` exposes the data