Gains · thresholds · timeouts

Tuning reference.

Source of truth for every numeric knob in the pipeline: PID gains, PPO hyperparameters, detection thresholds, state-machine timeouts, safety limits. Every value has a short "why" — so we know what to preserve and what to sweep.

Config file

race_config.py

single source

Serialized

YAML · snapshot in every log

reproducibility

Discipline

No magic numbers in pipeline code

all knobs in config

Sweep method

10+ repeats per value change

stat power

§ 01VQ1 PID controller

Parameter	Default	Why
`heading_pid.kp`	0.8	Responsive enough to carve through gates, soft enough not to oscillate at 6 m/s
`heading_pid.ki`	0.05	Small integral to kill steady-state yaw bias
`heading_pid.kd`	0.1	Damping — raise if overshoot, lower if phase-lag
`altitude_pid.kp`	0.6	Throttle proportional to altitude error
`altitude_pid.ki`	0.02	Small integral for throttle trim
`altitude_pid.kd`	0.15	Damping on vertical velocity
`pitch_pid.kp`	0.5	Gate-center-to-pitch mapping
`pitch_pid.kd`	0.1	Pitch-rate damping
`throttle_hover`	0.55	Calibrate per-sim — expect <0.5 on most configs
`throttle_gain_near_gate`	0.10	Small boost within 10 m to punch through
`target_speed_ms`	6.0	Conservative cruise for VQ1 completion
`max_speed_ms`	10.0	VQ1 hard cap — prefer reliability
`lost_detection_frames`	30	After 3 s of lost gate, enter yaw-sweep recovery

§ 02APEX PPO (Phase 3)

Parameter	Default	Source
Algorithm	PPO (clip 0.2)	Swift / MonoRace both
Architecture	MLP [256,256,256]	MonoRace G&CNet
Observation dim	28 (detector_telemetry) · 24 (privileged legacy)	see APEX §03
Action dim	4	Throttle · Roll · Pitch · Yaw
Total steps	10M	~4 hr RTX 5080 · 8 parallel envs
Learning rate	3e-4 → 0	Linear decay
Gamma	0.99	Standard discount
GAE lambda	0.95	Standard
n_steps (per update)	2048	MonoRace
Mini-batches	32	Per PPO epoch
PPO epochs	10	Per rollout
Parallel envs	8 (Windows) · 16 (Linux)	AIGP sim supports parallel

§ 03PPO reward weights

Term	Weight	Rationale
`r_progress` (gate-passed)	+100	Dominant signal; scales with speed
`r_perception` (Swift boresight · gate)	+0.3	Seek-attack behavior without state machine
`r_visible_centered` (detection near center)	+0.2	Extra incentive to hold gate in frame
`r_speed` (when visible, dist > 5m)	+0.15 · v/25	Reward speed only when safe
`r_smooth` (-Σ\|Δaction\|)	-0.02	Jerk penalty
`r_altitude`	-0.05 · \|Δalt\|	Stay near gate altitude
`r_crash`	-500	Genuinely crash-averse · playbook §02
`r_off_course` (>80m from any gate)	-150	Termination signal
`r_time_penalty`	-0.005 / step	Small urgency

§ 04Detector thresholds

Context	Threshold	Why
VQ1 pilot target-pick	0.25	Conservative · fall back to search when uncertain
VQ2 PPO observation confidence	0.15	Include low-conf for the signal itself in obs
Auto-label for dataset growth	0.70	Strong labels only; rest goes to human queue
Hard-negative mining	0.05–0.25 on non-gate frames	False positives become training targets
NMS IoU (for multi-detection scenes)	0.45	Avoid double-counting close gates

§ 05State-machine timeouts

Transition	Threshold	Why
APPROACH → TRANSIT	dist < 2.5 m	Inside gate opening
Transit commit	3 frames of closing	Reject PnP noise
TRANSIT → APPROACH (next)	dist increasing after commit	Gate behind us
APPROACH → SEEK	30 frames lost (3 s @ 10 fps)	Long enough to rule out temporary occlusion
SEEK yaw rate	0.15 → 0.225 rad/s	Sweep harder the longer we're blind
Run timeout (hard)	480 s	8-minute rules cap

§ 06Safety limits

Parameter	Limit	Why
max_tilt	70°	MonoRace uses 65+. Above this and recovery gets expensive
max_rate	720°/s	Aggressive but within motor authority
altitude_low	0.2 m	Ground-crash terminator
altitude_high	60 m	Ceiling terminator (course-dependent)
max_speed	30 m/s	Above this, policy gets reward penalty but not termination

Sweep discipline. Before changing any value: run 10+ attempts at the current value. Record. Change. Run 10+ attempts. Compare distributions, not single runs. A single improved attempt can be noise.

TUNING-REFERENCE · v2.0 2026-04-21 · ← Index · APEX