Detection guide · APEX Phase 1 + Phase 2

Vision pipeline reference.

Everything about the detector chain: model choices, dataset layout, evaluation thresholds, hard-negative mining, and the four-corner keypoint head that feeds PnP. This is the frontend for both VQ1 (PID) and VQ2 (PPO).

Primary
YOLO11n + YOLO11n-pose (APEX)
ships VQ1 + VQ2
Alternative
RF-DETR-Nano · DINOv2 backbone
P2 upgrade
Input
640×360 BGR · pinhole fx=fy=320
VADR-TS-002 §3.8
Dataset
dataset_gates_mega (2,759 → target 200K)
grows via capture pipeline
Target
mAP@50 > 99% · <5ms GPU
VQ2-ready

See also: VADR-TS-002 deltas for the full camera spec, UDP vision-stream protocol, and intrinsics matrix used by PnP.

§ 01Detector comparison

ModelmAP@50mAP@50:95Latency (T4 TRT)ParamsLicense
YOLO11n + pose92.1%39.5~5ms2.6MAGPL-3.0
YOLO26n94.3%41.53.5ms2.8MAGPL-3.0
RF-DETR-Nano96.5%48.42.3ms3.1MApache 2.0
RF-DETR-Base97.9%53.05.2ms29MApache 2.0
U-Net + RANSAC91.8%38.2~5ms1.1MMIT
Color (HSV)~88% (highlighted only)<1ms CPU

§ 02Why keypoints matter

Bounding boxes give you a rectangle. PnP needs actual gate corners to solve for 3D pose. The difference is the difference between "I see a gate somewhere" and "I know exactly where the gate is in 3D."

MethodDepth accuracyAngle toleranceLatency
Bounding-box center (bad)±30%Frontal only<1ms
YOLO11n-pose 4 corners + PnP±5% @ 5mUp to 60°~3ms
YOLO11n-pose + SAMD refinement±2% @ 5mAny angle+0.6ms

§ 03Datasets

DatasetSizeSourceUse
dataset_gates_mega2,759Auto-labeled FPV frames (HSV → YOLO format)Phase 1 + Phase 2 training
dataset_gates_mega_pose2,759Auto-generated 4-corner keypoints from bboxes + gate geometryPhase 2 keypoint training
dataset_gates_mega_coco2,759COCO-format conversionRF-DETR alternative path
dataset_gates_hardneg~1,000generate_training_sets.pyFalse-positive suppression (VQ2)
sim-day capture (future)~6,000Real VQ1-sim frames, slow lapsFine-tune once sim drops
attempt capture (future)~200K+ by VQ2Every VQ1/VQ2 attempt logs framesContinuous detector improvement · see playbook §03

§ 04Training commands

# Phase 1 — YOLO11n detector
python train_apex.py detector --dataset dataset_gates_mega --epochs 200

# Phase 2 — YOLO11n-pose 4 corners
python train_apex.py keypoints --dataset dataset_gates_mega --epochs 150

# Quick smoke
python train_apex.py detector --epochs 5
python train_apex.py keypoints --epochs 5

# Evaluate (mAP across confidence thresholds)
python train_apex.py eval

§ 05Confidence thresholds

ContextThresholdReasoning
VQ1 pilot gate-pick0.25Conservative — drop unsure detections, fall back to search
VQ2 PPO observation0.15Include low-conf for the confidence signal in obs vector
Data capture auto-label0.70Strong labels for dataset growth; lower gets human review
Hard-negative mining0.05–0.25 on non-gate framesFalse positives that trained detector flags confidently = training target

§ 06Augmentation pipeline

import albumentations as A
augment = A.Compose([
    A.RandomBrightnessContrast(brightness_limit=0.3, contrast_limit=0.3),
    A.GaussNoise(var_limit=(10, 80)),
    A.MotionBlur(blur_limit=15, p=0.3),
    A.RandomFog(fog_coef_lower=0.1, fog_coef_upper=0.4, p=0.2),
    A.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.3, hue=0.1),
    A.RandomShadow(p=0.3),
    A.CoarseDropout(max_holes=8, max_height=32, max_width=32, p=0.2),
], bbox_params=A.BboxParams(format='yolo'))
For VQ2 specifically — add synthetic lighting changes (the spec calls them out explicitly) and "gate-like false positives" (similar-colored arches, frames, posters). dataset_gates_hardneg is the starter; keep growing it.
VISION-DETECTION · v2.0 2026-04-21 · ← Index · APEX · Eval