Gains · thresholds · timeouts

Tuning reference.

Source of truth for every numeric knob in the pipeline: PID gains, PPO hyperparameters, detection thresholds, state-machine timeouts, safety limits. Every value has a short "why" — so we know what to preserve and what to sweep.

Config file
race_config.py
single source
Serialized
YAML · snapshot in every log
reproducibility
Discipline
No magic numbers in pipeline code
all knobs in config
Sweep method
10+ repeats per value change
stat power

§ 01VQ1 PID controller

ParameterDefaultWhy
heading_pid.kp0.8Responsive enough to carve through gates, soft enough not to oscillate at 6 m/s
heading_pid.ki0.05Small integral to kill steady-state yaw bias
heading_pid.kd0.1Damping — raise if overshoot, lower if phase-lag
altitude_pid.kp0.6Throttle proportional to altitude error
altitude_pid.ki0.02Small integral for throttle trim
altitude_pid.kd0.15Damping on vertical velocity
pitch_pid.kp0.5Gate-center-to-pitch mapping
pitch_pid.kd0.1Pitch-rate damping
throttle_hover0.55Calibrate per-sim — expect <0.5 on most configs
throttle_gain_near_gate0.10Small boost within 10 m to punch through
target_speed_ms6.0Conservative cruise for VQ1 completion
max_speed_ms10.0VQ1 hard cap — prefer reliability
lost_detection_frames30After 3 s of lost gate, enter yaw-sweep recovery

§ 02APEX PPO (Phase 3)

ParameterDefaultSource
AlgorithmPPO (clip 0.2)Swift / MonoRace both
ArchitectureMLP [256,256,256]MonoRace G&CNet
Observation dim28 (detector_telemetry) · 24 (privileged legacy)see APEX §03
Action dim4Throttle · Roll · Pitch · Yaw
Total steps10M~4 hr RTX 5080 · 8 parallel envs
Learning rate3e-4 → 0Linear decay
Gamma0.99Standard discount
GAE lambda0.95Standard
n_steps (per update)2048MonoRace
Mini-batches32Per PPO epoch
PPO epochs10Per rollout
Parallel envs8 (Windows) · 16 (Linux)AIGP sim supports parallel

§ 03PPO reward weights

TermWeightRationale
r_progress (gate-passed)+100Dominant signal; scales with speed
r_perception (Swift boresight · gate)+0.3Seek-attack behavior without state machine
r_visible_centered (detection near center)+0.2Extra incentive to hold gate in frame
r_speed (when visible, dist > 5m)+0.15 · v/25Reward speed only when safe
r_smooth (-Σ|Δaction|)-0.02Jerk penalty
r_altitude-0.05 · |Δalt|Stay near gate altitude
r_crash-500Genuinely crash-averse · playbook §02
r_off_course (>80m from any gate)-150Termination signal
r_time_penalty-0.005 / stepSmall urgency

§ 04Detector thresholds

ContextThresholdWhy
VQ1 pilot target-pick0.25Conservative · fall back to search when uncertain
VQ2 PPO observation confidence0.15Include low-conf for the signal itself in obs
Auto-label for dataset growth0.70Strong labels only; rest goes to human queue
Hard-negative mining0.05–0.25 on non-gate framesFalse positives become training targets
NMS IoU (for multi-detection scenes)0.45Avoid double-counting close gates

§ 05State-machine timeouts

TransitionThresholdWhy
APPROACH → TRANSITdist < 2.5 mInside gate opening
Transit commit3 frames of closingReject PnP noise
TRANSIT → APPROACH (next)dist increasing after commitGate behind us
APPROACH → SEEK30 frames lost (3 s @ 10 fps)Long enough to rule out temporary occlusion
SEEK yaw rate0.15 → 0.225 rad/sSweep harder the longer we're blind
Run timeout (hard)480 s8-minute rules cap

§ 06Safety limits

ParameterLimitWhy
max_tilt70°MonoRace uses 65+. Above this and recovery gets expensive
max_rate720°/sAggressive but within motor authority
altitude_low0.2 mGround-crash terminator
altitude_high60 mCeiling terminator (course-dependent)
max_speed30 m/sAbove this, policy gets reward penalty but not termination
Sweep discipline. Before changing any value: run 10+ attempts at the current value. Record. Change. Run 10+ attempts. Compare distributions, not single runs. A single improved attempt can be noise.
TUNING-REFERENCE · v2.0 2026-04-21 · ← Index · APEX