Detection guide · APEX Phase 1 + Phase 2

Vision pipeline reference.

Everything about the detector chain: model choices, dataset layout, evaluation thresholds, hard-negative mining, and the four-corner keypoint head that feeds PnP. This is the frontend for both VQ1 (PID) and VQ2 (PPO).

Primary

YOLO11n + YOLO11n-pose (APEX)

ships VQ1 + VQ2

Alternative

RF-DETR-Nano · DINOv2 backbone

P2 upgrade

Input

640×360 BGR · pinhole fx=fy=320

VADR-TS-002 §3.8

Dataset

dataset_gates_mega (2,759 → target 200K)

grows via capture pipeline

Target

mAP@50 > 99% · <5ms GPU

VQ2-ready

See also: VADR-TS-002 deltas for the full camera spec, UDP vision-stream protocol, and intrinsics matrix used by PnP.

§ 01Detector comparison

Model	mAP@50	mAP@50:95	Latency (T4 TRT)	Params	License
YOLO11n + pose	92.1%	39.5	~5ms	2.6M	AGPL-3.0
YOLO26n	94.3%	41.5	3.5ms	2.8M	AGPL-3.0
RF-DETR-Nano	96.5%	48.4	2.3ms	3.1M	Apache 2.0
RF-DETR-Base	97.9%	53.0	5.2ms	29M	Apache 2.0
U-Net + RANSAC	91.8%	38.2	~5ms	1.1M	MIT
Color (HSV)	~88% (highlighted only)	—	<1ms CPU	—	—

§ 02Why keypoints matter

Bounding boxes give you a rectangle. PnP needs actual gate corners to solve for 3D pose. The difference is the difference between "I see a gate somewhere" and "I know exactly where the gate is in 3D."

Method	Depth accuracy	Angle tolerance	Latency
Bounding-box center (bad)	±30%	Frontal only	<1ms
YOLO11n-pose 4 corners + PnP	±5% @ 5m	Up to 60°	~3ms
YOLO11n-pose + SAMD refinement	±2% @ 5m	Any angle	+0.6ms

§ 03Datasets

Dataset	Size	Source	Use
`dataset_gates_mega`	2,759	Auto-labeled FPV frames (HSV → YOLO format)	Phase 1 + Phase 2 training
`dataset_gates_mega_pose`	2,759	Auto-generated 4-corner keypoints from bboxes + gate geometry	Phase 2 keypoint training
`dataset_gates_mega_coco`	2,759	COCO-format conversion	RF-DETR alternative path
`dataset_gates_hardneg`	~1,000	generate_training_sets.py	False-positive suppression (VQ2)
sim-day capture (future)	~6,000	Real VQ1-sim frames, slow laps	Fine-tune once sim drops
attempt capture (future)	~200K+ by VQ2	Every VQ1/VQ2 attempt logs frames	Continuous detector improvement · see playbook §03

§ 04Training commands

# Phase 1 — YOLO11n detector
python train_apex.py detector --dataset dataset_gates_mega --epochs 200

# Phase 2 — YOLO11n-pose 4 corners
python train_apex.py keypoints --dataset dataset_gates_mega --epochs 150

# Quick smoke
python train_apex.py detector --epochs 5
python train_apex.py keypoints --epochs 5

# Evaluate (mAP across confidence thresholds)
python train_apex.py eval

§ 05Confidence thresholds

Context	Threshold	Reasoning
VQ1 pilot gate-pick	0.25	Conservative — drop unsure detections, fall back to search
VQ2 PPO observation	0.15	Include low-conf for the confidence signal in obs vector
Data capture auto-label	0.70	Strong labels for dataset growth; lower gets human review
Hard-negative mining	0.05–0.25 on non-gate frames	False positives that trained detector flags confidently = training target

§ 06Augmentation pipeline

import albumentations as A
augment = A.Compose([
    A.RandomBrightnessContrast(brightness_limit=0.3, contrast_limit=0.3),
    A.GaussNoise(var_limit=(10, 80)),
    A.MotionBlur(blur_limit=15, p=0.3),
    A.RandomFog(fog_coef_lower=0.1, fog_coef_upper=0.4, p=0.2),
    A.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.3, hue=0.1),
    A.RandomShadow(p=0.3),
    A.CoarseDropout(max_holes=8, max_height=32, max_width=32, p=0.2),
], bbox_params=A.BboxParams(format='yolo'))

For VQ2 specifically — add synthetic lighting changes (the spec calls them out explicitly) and "gate-like false positives" (similar-colored arches, frames, posters). dataset_gates_hardneg is the starter; keep growing it.

VISION-DETECTION · v2.0 2026-04-21 · ← Index · APEX · Eval