Detection guide · APEX Phase 1 + Phase 2
Vision pipeline reference.
Everything about the detector chain: model choices, dataset layout, evaluation thresholds, hard-negative mining, and the four-corner keypoint head that feeds PnP. This is the frontend for both VQ1 (PID) and VQ2 (PPO).
Primary
YOLO11n + YOLO11n-pose (APEX)
ships VQ1 + VQ2
Alternative
RF-DETR-Nano · DINOv2 backbone
P2 upgrade
Input
640×360 BGR · pinhole fx=fy=320
VADR-TS-002 §3.8
Dataset
dataset_gates_mega (2,759 → target 200K)
grows via capture pipeline
Target
mAP@50 > 99% · <5ms GPU
VQ2-ready
See also: VADR-TS-002 deltas for the full camera spec, UDP vision-stream protocol, and intrinsics matrix used by PnP.
§ 01Detector comparison
| Model | mAP@50 | mAP@50:95 | Latency (T4 TRT) | Params | License |
| YOLO11n + pose | 92.1% | 39.5 | ~5ms | 2.6M | AGPL-3.0 |
| YOLO26n | 94.3% | 41.5 | 3.5ms | 2.8M | AGPL-3.0 |
| RF-DETR-Nano | 96.5% | 48.4 | 2.3ms | 3.1M | Apache 2.0 |
| RF-DETR-Base | 97.9% | 53.0 | 5.2ms | 29M | Apache 2.0 |
| U-Net + RANSAC | 91.8% | 38.2 | ~5ms | 1.1M | MIT |
| Color (HSV) | ~88% (highlighted only) | — | <1ms CPU | — | — |
§ 02Why keypoints matter
Bounding boxes give you a rectangle. PnP needs actual gate corners to solve for 3D pose. The difference is the difference between "I see a gate somewhere" and "I know exactly where the gate is in 3D."
| Method | Depth accuracy | Angle tolerance | Latency |
| Bounding-box center (bad) | ±30% | Frontal only | <1ms |
| YOLO11n-pose 4 corners + PnP | ±5% @ 5m | Up to 60° | ~3ms |
| YOLO11n-pose + SAMD refinement | ±2% @ 5m | Any angle | +0.6ms |
§ 03Datasets
| Dataset | Size | Source | Use |
dataset_gates_mega | 2,759 | Auto-labeled FPV frames (HSV → YOLO format) | Phase 1 + Phase 2 training |
dataset_gates_mega_pose | 2,759 | Auto-generated 4-corner keypoints from bboxes + gate geometry | Phase 2 keypoint training |
dataset_gates_mega_coco | 2,759 | COCO-format conversion | RF-DETR alternative path |
dataset_gates_hardneg | ~1,000 | generate_training_sets.py | False-positive suppression (VQ2) |
| sim-day capture (future) | ~6,000 | Real VQ1-sim frames, slow laps | Fine-tune once sim drops |
| attempt capture (future) | ~200K+ by VQ2 | Every VQ1/VQ2 attempt logs frames | Continuous detector improvement · see playbook §03 |
§ 04Training commands
# Phase 1 — YOLO11n detector
python train_apex.py detector --dataset dataset_gates_mega --epochs 200
# Phase 2 — YOLO11n-pose 4 corners
python train_apex.py keypoints --dataset dataset_gates_mega --epochs 150
# Quick smoke
python train_apex.py detector --epochs 5
python train_apex.py keypoints --epochs 5
# Evaluate (mAP across confidence thresholds)
python train_apex.py eval
§ 05Confidence thresholds
| Context | Threshold | Reasoning |
| VQ1 pilot gate-pick | 0.25 | Conservative — drop unsure detections, fall back to search |
| VQ2 PPO observation | 0.15 | Include low-conf for the confidence signal in obs vector |
| Data capture auto-label | 0.70 | Strong labels for dataset growth; lower gets human review |
| Hard-negative mining | 0.05–0.25 on non-gate frames | False positives that trained detector flags confidently = training target |
§ 06Augmentation pipeline
import albumentations as A
augment = A.Compose([
A.RandomBrightnessContrast(brightness_limit=0.3, contrast_limit=0.3),
A.GaussNoise(var_limit=(10, 80)),
A.MotionBlur(blur_limit=15, p=0.3),
A.RandomFog(fog_coef_lower=0.1, fog_coef_upper=0.4, p=0.2),
A.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.3, hue=0.1),
A.RandomShadow(p=0.3),
A.CoarseDropout(max_holes=8, max_height=32, max_width=32, p=0.2),
], bbox_params=A.BboxParams(format='yolo'))
For VQ2 specifically — add synthetic lighting changes (the spec calls them out explicitly) and "gate-like false positives" (similar-colored arches, frames, posters). dataset_gates_hardneg is the starter; keep growing it.