VQ2 false-positive suppression

Hard-negative mining.

The detector has never seen a stone archway, an industrial fan, or a doorway — yet VQ2's 3D-scanned realistic environments are full of them. Unless we explicitly train against those distractors, the detector will produce false positives that send the drone chasing ghosts. mine_hardneg.py feeds the detector exactly the images it's going to fire incorrectly on, tags the high-confidence false positives as high-value training samples, and saves them with empty YOLO labels so the next training run learns "nope, NOT a gate."

Input

Any dir of non-gate images

curated or harvested

Harvester

--keyword "industrial arch"

Google + Bing via icrawler

Output

dataset_gates_hardneg_v2/

YOLO · empty labels

Gain

↓ FP rate on VQ2 realistic frames

40-100 epochs fine-tune

Why empty labels work. YOLO's loss has an objectness term that penalizes the detector for predicting any box on a background image. Feeding images with zero ground-truth boxes actively suppresses the features that would have triggered a false positive. Hundreds of such examples per distractor class is usually enough.

§ 01Three subcommands

Subcommand	What it does
`harvest`	Download ~N images for a keyword via Google + Bing image search (icrawler). Writes to `harvested/<slug>/`.
`mine`	Run current detector on an input dir, save images as YOLO negatives (empty labels). Records FP triggers in `false_positives.json`.
`stats`	Dataset counts + top FP sources + hardest negatives (highest-confidence wrong detections).

§ 02Typical workflow

1 · Harvest a distractor class

# Download ~200 images (Google + Bing) for one visual concept
./aigp/Scripts/python.exe mine_hardneg.py harvest \
  --keyword "industrial scaffolding" --num 200

Output: harvested/industrial_scaffolding/ with 100-200 jpg/png files. Inspect the folder — if any images actually contain gates (rare but possible with generic queries), delete them manually before mining.

2 · Mine false positives

./aigp/Scripts/python.exe mine_hardneg.py mine \
  --input-dir harvested/industrial_scaffolding

The tool runs your current detector (models/apex_yolo11n.pt) at conf=0.25 and saves any image that triggered a detection. These are the "hard" cases — visually similar enough to gates that your detector fires incorrectly. They go to dataset_gates_hardneg_v2/ with empty labels.

3 · Repeat for more distractor classes

Each keyword adds diversity. Run harvest + mine for each class below (battle-tested to trigger gate FPs):

Keyword	Why it confuses the detector
`stone archway`	Rectangular opening, high contrast edges
`warehouse doorway`	Frame-within-frame structure, often metal
`circular exhaust fan`	Round opening with internal structure (VQ1 "highlighted" gates)
`picture frame wall`	Pure rectangle prior → bbox false positives
`rectangular window frame`	Same as above plus reflective glass artefacts
`tunnel entrance`	Dark rectangular opening framed by light edges
`clock tower face`	Circular high-contrast disk
`industrial scaffolding`	Lots of rectangular framing elements

4 · Review the manifest

./aigp/Scripts/python.exe mine_hardneg.py stats

Shows the top-N highest-confidence false positives. If the detector is firing at conf ≥ 0.85 on something, that's exactly what you want in training — the hardest examples produce the biggest gradient signal.

5 · Fine-tune

./aigp/Scripts/python.exe train_apex.py detector \
  --dataset dataset_gates_hardneg_v2 --epochs 40

Start from your existing models/apex_yolo11n.pt (train_apex.py resumes from prior best by default) and fine-tune for 30-50 epochs. Anything more risks catastrophic forgetting on the positive data.

Better: merge positives + negatives into a combined training run. Edit dataset_gates_mega/data.yaml to include the hardneg dir, or use a mix config (pattern already in dataset_gates_domain/).

§ 03Flags

`harvest`

Flag	Default	What it does
`--keyword`	required	Search term (quote multi-word phrases)
`--num`	200	Total images across Google + Bing

`mine`

Flag	Default	What it does
`--input-dir`	required	Dir of non-gate images to process
`--model`	`models/apex_yolo11n.pt`	Detector to use for FP discovery
`--conf`	`0.25`	Conf threshold — anything above = false positive
`--imgsz`	`640`	Detector input size
`--all-negatives`	false	Include images with no detections (plain backgrounds)
`--max-images`	2000	Cap total saved per run

§ 04Pitfalls

Pitfall	Symptom	Fix
Query returns actual gates	e.g. "metal gate" → real racing gates in harvest	Audit the harvest dir; delete gate-containing images BEFORE running `mine`. Empty labels on real gates would hurt recall.
Too few FPs	`fp_saved = 2` after 200 images	Your detector is already robust to that class. Either move on (good sign) or drop `--conf 0.10` to surface lower-confidence FPs.
Thumbnail-sized harvest	Skipped count high	Tool rejects < 320 px min-dim. Try different keywords — some Google results are tiny.
Near-dup clustering	Same image saved in 5 colors	dHash dedup already handles most; if still happening, re-mine with a tighter conf or manually delete.
Catastrophic forgetting	After fine-tune, positive recall drops	Don't fine-tune on hardneg alone for many epochs. Mix with positives at 80:20 (positive:negative) ratio, or limit to 30 epochs.

§ 05Related

DCL Gameplay Capture — the other pre-sim data source (positive examples, visually VQ2-realistic)
Training Plan — where hardneg mining fits in the overall pre-sim strategy
Training Runbook — exact commands to retrain once datasets are updated