Before diving into detailed debugging, run through these five checks first. Most issues stem from one of these being wrong.
| # | Check | How to Verify | Expected Result |
|---|---|---|---|
| 1 | PX4 SITL running? | make px4_sitl gz_x500 | Console prints Ready for takeoff! |
| 2 | MAVSDK connection alive? | Look for HEARTBEAT messages in console | Heartbeat every 1s, no timeout warnings |
| 3 | Camera producing frames? | python camera_adapter.py | Live preview window or saved test frame |
| 4 | Correct vision mode set? | Check race_config.py → VisionSettings.mode | One of: color, yolo, unet |
| 5 | hover_thrust calibrated? | Hover drone with no gate targets | Drone holds altitude ±0.3m |
The MAVSDK bridge cannot reach PX4. This is the most common startup issue.
sim_url correct in your config?Fix 1: Start PX4 SITL first, then connect:
make px4_sitl gz_x500
Fix 2: Verify the connection URL is correct. The bridge expects:
udpin://0.0.0.0:14540
udp://localhost:14540 instead of udpin://0.0.0.0:14540. The udpin:// scheme and 0.0.0.0 bind address are both required for MAVSDK to receive PX4's UDP stream.Fix 3: Wait at least 5 seconds after PX4 starts before attempting MAVSDK connection. PX4 needs time to initialize its MAVLink endpoints.
PX4 requires a continuous setpoint stream before it will accept an offboard mode switch. This is a safety feature.
Fix 1: The bridge sends setpoints for 1 second before calling start_offboard(). If it still fails, increase the pre-send duration to 2–3 seconds.
Fix 2: Ensure the drone is armed before requesting offboard mode. Arming must succeed first.
Fix 3: Check QGroundControl for specific arm/mode rejection reasons — PX4 often provides a detailed error string.
You can see position/attitude data but the drone ignores commands.
Cause: Offboard mode is not active, or the wrong control mode is being used.
bridge.start_offboard() was called with the correct ControlMode.No gates are being detected at all. The cause depends on which vision mode you're using.
vision_debug.png) and check if the HSV mask captures any gate pixelsgreen, cyan, magentaYOLO model loaded: ... in console outputgate_detector.pt exists at the expected pathgate_seg_best.pt existspython gate_segmentation.py train --data dataset_gates_segGate is detected on some frames but drops out on others, causing erratic behavior.
Cause: Confidence hovers near the threshold (0.3), or the detected area is close to min_area.
min_area in VisionSettings — try 200 instead of 500GateDetection.is_valid checks > 0.3, try 0.2The estimated distance to the gate doesn't match reality. This cascades into bad approach/transit behavior.
| Cause | Symptom | Fix |
|---|---|---|
| Gate dimensions in config don't match actual size | Consistent over/under-estimation at all distances | Measure actual gate, set GateSettings.width and .height |
| Corner detection inaccurate (YOLO bbox corners) | Distance jumps around frame-to-frame | Switch to U-Net mode for RANSAC-based corners |
| Camera FOV not calibrated | Distance error scales with position in frame | Measure actual FOV or run OpenCV checkerboard calibration |
Cause: Not enough contour points on one or more edges (fewer than 2 per edge).
minAreaRect fallback is fine — it's only slightly less accurate. You'll see this message frequently at long range.The four corner points drawn on the debug overlay are in the wrong positions or swapped.
Check the corner ordering convention:
TL(0) -------- TR(1)
| |
| GATE |
| |
BL(3) -------- BR(2)
If corners are swapped, verify the ordering logic in your detector's corner-sorting code. A common bug is confusing screen coordinates (Y-down) with world coordinates (Y-up).
The drone rotates searching for a gate but never transitions to APPROACH.
| Cause | Diagnosis | Fix |
|---|---|---|
| No gates in camera FOV | Check camera angle — is it pointing forward? | Verify camera mount, not pointing at ground |
seek_yaw_rate too high |
Motion blur kills detection at high rotation rates | Lower to 90–120 deg/s |
| Vision pipeline returning 0 detections | Check debug frames and console output | See Section 3 above |
The drone approaches the gate but never registers as having passed through it.
transit_distance too small: The gate must be closer than 1.5m to trigger transit. Increase to 2.0m or 3.0m if the drone is fast.command_hz or increase transit_distance.race_pipeline.py.
Cause: Distance jitter near the threshold causes false positives.
Fix: The predictive transit system (distance_closing_count >= 3) should prevent this. If it still happens, increase the closing count threshold to 4 or 5. You can also add a minimum approach speed requirement — the drone should be moving forward when transit triggers.
Cause: seek_timeout is too short (default 30s). If gates are far apart, the drone may legitimately take longer than 30s to find the next one.
seek_timeout. Alternatively, increase approach_distance so the drone enters APPROACH sooner when a distant gate is first detected, keeping the SEEK phase short.
Cause: no_gate_finish_timeout expired (30s after last gate) before the drone found the next gate.
Fix: Increase no_gate_finish_timeout, or set expected_gates to the known gate count so the race doesn't end until all gates are passed.
Cause: kp_yaw is too high, causing overcorrection.
kp_yaw by 20–30% (e.g., 50 → 35). Also check kp_roll — roll coupling with yaw can amplify oscillations.
Cause: Too much forward speed, not enough deceleration before the gate.
Fix 1: Increase approach_dist — start slowing down earlier.
Fix 2: Reduce cruise_pitch — less aggressive forward tilt means lower top speed.
Cause: hover_thrust is too low for the vehicle weight.
hover_thrust by 0.02–0.05 increments.Cause: hover_thrust is too high.
Fix: Decrease hover_thrust by 0.02–0.05 increments until hover is stable.
Cause: kp_yaw is too low.
Fix: Increase kp_yaw. Also increase kp_roll — banking into turns significantly improves yaw responsiveness and allows tighter tracking at speed.
Cause: cruise_pitch is too aggressive (e.g., -35 degrees).
Fix 1: Reduce cruise_pitch to -20 or -15 degrees.
Fix 2: Ensure min_altitude safety check is active — this should override pitch commands if the drone drops below a safe height.
Low inference framerate means the control loop is working with stale data, causing sluggish or jerky flight.
| Fix | Expected Improvement |
|---|---|
Ensure GPU is being used: torch.cuda.is_available() must return True | 5–10x over CPU |
Export model to TensorRT .engine format | 2–3x over vanilla PyTorch |
| Reduce input resolution (320x240 is still usable) | 2–4x depending on original resolution |
| Use FP16 inference if GPU supports it | 1.5–2x |
Cause: Vision inference is blocking the main control loop.
Cause: Multiple PyTorch models loaded simultaneously (YOLO + U-Net + color detector all in GPU memory).
Fix: Only load the active detector mode. Unload unused models. A single YOLO model uses ~100MB GPU RAM; U-Net is similar. Loading both wastes memory for no benefit.
The sim-to-real gap is real. Here's what typically breaks and how to fix it.
| Domain | Sim vs. Real Difference | Fix |
|---|---|---|
| Camera | Real cameras have lens distortion, motion blur, auto-exposure variance, rolling shutter | Calibrate intrinsics with OpenCV checkerboard. Train/fine-tune detector on real camera footage. Lock exposure if possible. |
| Latency | Real hardware has USB/serial latency (5–20ms additional) | Measure actual end-to-end latency. Reduce control gains proportionally. Consider latency-compensating prediction. |
| Thrust | Real drone weight, battery voltage, and prop efficiency differ from sim | Re-calibrate hover_thrust on real hardware. Start conservative and increase. |
| Vibration | Real IMU picks up motor/prop vibration, degrading attitude estimates | Check vibration levels in QGroundControl. Use soft-mounted flight controller. Balance props. |
| Lighting | Real venues have inconsistent lighting, shadows, reflections | Use learned detectors (YOLO/U-Net) rather than color-based. Train on diverse lighting conditions. |
The race logger writes JSONL files to race_logs/. Each line is a timestamped snapshot of the race state:
{"t": 1.234, "phase": "approach", "gate_dist": 5.2, "speed": 8.1, "bearing_x": 0.15, "det_count": 1}
Detection rate — count frames with zero detections:
grep -c '"det_count": 0' race.jsonl
Phase distribution — see how much time was spent in each state:
grep -o '"phase": "[^"]*"' race.jsonl | sort | uniq -c
Speed over time — extract for plotting:
jq -r '[.t, .speed] | @csv' race.jsonl > speed.csv
| Metric | Healthy Value | Problem Indicator |
|---|---|---|
| Detection rate | > 80% of frames | < 50% means vision pipeline is struggling |
| Time in SEEK | < 5s per gate | > 15s means gates are hard to find or seek rate is wrong |
| Speed at transit | 4–12 m/s | > 15 m/s risks overshoot; < 2 m/s is too conservative |
| Bearing error at transit | < 0.1 (centered) | > 0.3 means drone is clipping the gate edge |
Any PC with Python 3.8+ can run the sim with a synthetic camera. A GPU (NVIDIA, with CUDA) is only needed for YOLO/U-Net training and fast inference. For development and testing with the color detector, CPU-only is fine.
Yes. Update CameraSettings in race_config.py to match your camera's resolution and FOV. Run an OpenCV checkerboard calibration to get accurate intrinsic parameters. The vision pipeline is camera-agnostic as long as it receives frames in the expected format.
Add a preset in the ColorGateDetector.presets dictionary, or set custom hsv_lower / hsv_upper values in config. Use the debug frame output to verify your HSV ranges capture the gate reliably under different lighting.
Competition rules — GPS is disabled during races. Additionally, indoor racing venues have no GPS signal. Vision-only navigation forces robustness and is more representative of real-world autonomous flight in GPS-denied environments.
Not currently — VisionPipeline uses one mode at a time. However, you could implement a cascade: try U-Net first (most accurate), fall back to color detection if U-Net returns no detections. This adds latency for the fallback path but improves reliability.
At 120Hz command rate, you have 8.33ms per control cycle. Typical breakdown:
| Stage | Budget |
|---|---|
| U-Net inference | ~5ms |
| PnP solve + corner extraction | ~1ms |
| Control law computation | ~0.5ms |
| MAVLink serialize + send | ~0.5ms |
| Margin | ~1.3ms |
Tight but achievable. If vision is slower, decouple it from the control loop (see Section 6).
Fly toward a gate at a known distance (use sim ground truth or a tape measure). Compare PnP output to ground truth at several distances. Error should be < 10% at distances under 20m. If error is consistently biased (always over or under), check gate dimensions in config. If error is noisy, check corner detection quality.
Common causes of mid-race failures:
hover_thrust becomes insufficient as voltage drops. Monitor battery in telemetry.In simulation, kill the race script and restart. The drone will fall (gravity). For a clean reset: land first, disarm, then restart the race pipeline. In PX4 SITL you can also use QGroundControl to switch to manual/position mode before restarting.