Gains · thresholds · timeouts
Tuning reference.
Source of truth for every numeric knob in the pipeline: PID gains, PPO hyperparameters, detection thresholds, state-machine timeouts, safety limits. Every value has a short "why" — so we know what to preserve and what to sweep.
Config file
race_config.py
single source
Serialized
YAML · snapshot in every log
reproducibility
Discipline
No magic numbers in pipeline code
all knobs in config
Sweep method
10+ repeats per value change
stat power
§ 01VQ1 PID controller
| Parameter | Default | Why |
heading_pid.kp | 0.8 | Responsive enough to carve through gates, soft enough not to oscillate at 6 m/s |
heading_pid.ki | 0.05 | Small integral to kill steady-state yaw bias |
heading_pid.kd | 0.1 | Damping — raise if overshoot, lower if phase-lag |
altitude_pid.kp | 0.6 | Throttle proportional to altitude error |
altitude_pid.ki | 0.02 | Small integral for throttle trim |
altitude_pid.kd | 0.15 | Damping on vertical velocity |
pitch_pid.kp | 0.5 | Gate-center-to-pitch mapping |
pitch_pid.kd | 0.1 | Pitch-rate damping |
throttle_hover | 0.55 | Calibrate per-sim — expect <0.5 on most configs |
throttle_gain_near_gate | 0.10 | Small boost within 10 m to punch through |
target_speed_ms | 6.0 | Conservative cruise for VQ1 completion |
max_speed_ms | 10.0 | VQ1 hard cap — prefer reliability |
lost_detection_frames | 30 | After 3 s of lost gate, enter yaw-sweep recovery |
§ 02APEX PPO (Phase 3)
| Parameter | Default | Source |
| Algorithm | PPO (clip 0.2) | Swift / MonoRace both |
| Architecture | MLP [256,256,256] | MonoRace G&CNet |
| Observation dim | 28 (detector_telemetry) · 24 (privileged legacy) | see APEX §03 |
| Action dim | 4 | Throttle · Roll · Pitch · Yaw |
| Total steps | 10M | ~4 hr RTX 5080 · 8 parallel envs |
| Learning rate | 3e-4 → 0 | Linear decay |
| Gamma | 0.99 | Standard discount |
| GAE lambda | 0.95 | Standard |
| n_steps (per update) | 2048 | MonoRace |
| Mini-batches | 32 | Per PPO epoch |
| PPO epochs | 10 | Per rollout |
| Parallel envs | 8 (Windows) · 16 (Linux) | AIGP sim supports parallel |
§ 03PPO reward weights
| Term | Weight | Rationale |
r_progress (gate-passed) | +100 | Dominant signal; scales with speed |
r_perception (Swift boresight · gate) | +0.3 | Seek-attack behavior without state machine |
r_visible_centered (detection near center) | +0.2 | Extra incentive to hold gate in frame |
r_speed (when visible, dist > 5m) | +0.15 · v/25 | Reward speed only when safe |
r_smooth (-Σ|Δaction|) | -0.02 | Jerk penalty |
r_altitude | -0.05 · |Δalt| | Stay near gate altitude |
r_crash | -500 | Genuinely crash-averse · playbook §02 |
r_off_course (>80m from any gate) | -150 | Termination signal |
r_time_penalty | -0.005 / step | Small urgency |
§ 04Detector thresholds
| Context | Threshold | Why |
| VQ1 pilot target-pick | 0.25 | Conservative · fall back to search when uncertain |
| VQ2 PPO observation confidence | 0.15 | Include low-conf for the signal itself in obs |
| Auto-label for dataset growth | 0.70 | Strong labels only; rest goes to human queue |
| Hard-negative mining | 0.05–0.25 on non-gate frames | False positives become training targets |
| NMS IoU (for multi-detection scenes) | 0.45 | Avoid double-counting close gates |
§ 05State-machine timeouts
| Transition | Threshold | Why |
| APPROACH → TRANSIT | dist < 2.5 m | Inside gate opening |
| Transit commit | 3 frames of closing | Reject PnP noise |
| TRANSIT → APPROACH (next) | dist increasing after commit | Gate behind us |
| APPROACH → SEEK | 30 frames lost (3 s @ 10 fps) | Long enough to rule out temporary occlusion |
| SEEK yaw rate | 0.15 → 0.225 rad/s | Sweep harder the longer we're blind |
| Run timeout (hard) | 480 s | 8-minute rules cap |
§ 06Safety limits
| Parameter | Limit | Why |
| max_tilt | 70° | MonoRace uses 65+. Above this and recovery gets expensive |
| max_rate | 720°/s | Aggressive but within motor authority |
| altitude_low | 0.2 m | Ground-crash terminator |
| altitude_high | 60 m | Ceiling terminator (course-dependent) |
| max_speed | 30 m/s | Above this, policy gets reward penalty but not termination |
Sweep discipline. Before changing any value: run 10+ attempts at the current value. Record. Change. Run 10+ attempts. Compare distributions, not single runs. A single improved attempt can be noise.