AI Grand Prix — Troubleshooting & FAQ

Common issues, debugging flowcharts, and performance optimization

1. Quick Diagnostic Checklist

Before diving into detailed debugging, run through these five checks first. Most issues stem from one of these being wrong.

#CheckHow to VerifyExpected Result
1PX4 SITL running?make px4_sitl gz_x500Console prints Ready for takeoff!
2MAVSDK connection alive?Look for HEARTBEAT messages in consoleHeartbeat every 1s, no timeout warnings
3Camera producing frames?python camera_adapter.pyLive preview window or saved test frame
4Correct vision mode set?Check race_config.pyVisionSettings.modeOne of: color, yolo, unet
5hover_thrust calibrated?Hover drone with no gate targetsDrone holds altitude ±0.3m
Tip: If all five checks pass and you still have issues, proceed to the relevant section below.

2. Connection Issues

Problem: "Connection refused" or "No HEARTBEAT"

The MAVSDK bridge cannot reach PX4. This is the most common startup issue.

Diagnostic Steps:
  1. Is PX4 actually running? Check for the SITL process.
  2. Is sim_url correct in your config?
  3. Are you waiting long enough after PX4 starts?

Fix 1: Start PX4 SITL first, then connect:

make px4_sitl gz_x500

Fix 2: Verify the connection URL is correct. The bridge expects:

udpin://0.0.0.0:14540
Common mistake: Using udp://localhost:14540 instead of udpin://0.0.0.0:14540. The udpin:// scheme and 0.0.0.0 bind address are both required for MAVSDK to receive PX4's UDP stream.

Fix 3: Wait at least 5 seconds after PX4 starts before attempting MAVSDK connection. PX4 needs time to initialize its MAVLink endpoints.

Problem: "Offboard mode rejected"

PX4 requires a continuous setpoint stream before it will accept an offboard mode switch. This is a safety feature.

Why this happens: PX4 enforces that at least 2 Hz of setpoints are arriving before it allows offboard control. If the stream hasn't started or was interrupted, the mode switch is rejected.

Fix 1: The bridge sends setpoints for 1 second before calling start_offboard(). If it still fails, increase the pre-send duration to 2–3 seconds.

Fix 2: Ensure the drone is armed before requesting offboard mode. Arming must succeed first.

Fix 3: Check QGroundControl for specific arm/mode rejection reasons — PX4 often provides a detailed error string.

Problem: "Telemetry but no control"

You can see position/attitude data but the drone ignores commands.

Cause: Offboard mode is not active, or the wrong control mode is being used.

Fix:
  1. Verify bridge.start_offboard() was called with the correct ControlMode.
  2. Confirm arming succeeded (check QGroundControl for arm errors).
  3. Ensure no other GCS (like QGroundControl) is also sending commands — conflicting control sources will cause PX4 to reject offboard.

3. Vision & Detection Issues

Problem: "0 detections on every frame"

No gates are being detected at all. The cause depends on which vision mode you're using.

Color Mode
YOLO Mode
U-Net Mode

Problem: "Detections flicker on/off"

Gate is detected on some frames but drops out on others, causing erratic behavior.

Cause: Confidence hovers near the threshold (0.3), or the detected area is close to min_area.

Fixes:

Problem: "PnP distance seems wrong"

The estimated distance to the gate doesn't match reality. This cascades into bad approach/transit behavior.

CauseSymptomFix
Gate dimensions in config don't match actual size Consistent over/under-estimation at all distances Measure actual gate, set GateSettings.width and .height
Corner detection inaccurate (YOLO bbox corners) Distance jumps around frame-to-frame Switch to U-Net mode for RANSAC-based corners
Camera FOV not calibrated Distance error scales with position in frame Measure actual FOV or run OpenCV checkerboard calibration

Problem: "RANSAC returns None — falling back to minAreaRect"

Cause: Not enough contour points on one or more edges (fewer than 2 per edge).

Don't panic: This happens with very small or very distant gates. The minAreaRect fallback is fine — it's only slightly less accurate. You'll see this message frequently at long range.

Problem: "Corners look wrong in debug frame"

The four corner points drawn on the debug overlay are in the wrong positions or swapped.

Critical: If corners are swapped, PnP will produce wildly wrong rotation and distance estimates. This will cause the drone to fly in the wrong direction.

Check the corner ordering convention:

TL(0) -------- TR(1)
  |              |
  |    GATE      |
  |              |
BL(3) -------- BR(2)

If corners are swapped, verify the ordering logic in your detector's corner-sorting code. A common bug is confusing screen coordinates (Y-down) with world coordinates (Y-up).

4. State Machine Issues

Problem: "Stuck in SEEK_GATE forever"

The drone rotates searching for a gate but never transitions to APPROACH.

CauseDiagnosisFix
No gates in camera FOV Check camera angle — is it pointing forward? Verify camera mount, not pointing at ground
seek_yaw_rate too high Motion blur kills detection at high rotation rates Lower to 90–120 deg/s
Vision pipeline returning 0 detections Check debug frames and console output See Section 3 above

Problem: "Never triggers TRANSIT_GATE"

The drone approaches the gate but never registers as having passed through it.

Cause 1 — transit_distance too small: The gate must be closer than 1.5m to trigger transit. Increase to 2.0m or 3.0m if the drone is fast.

Cause 2 — Not enough closing frames: Predictive transit requires 3+ consecutive frames where distance is decreasing. If flying too fast, the drone passes through in fewer than 3 frames. Lower command_hz or increase transit_distance.

Cause 3 — Cooldown still active: After a transit, there's a 0.3s cooldown before the next transit can trigger. If this is too long for closely-spaced gates, reduce cooldown in race_pipeline.py.

Problem: "False transit triggers (gate counted but not passed)"

Cause: Distance jitter near the threshold causes false positives.

Fix: The predictive transit system (distance_closing_count >= 3) should prevent this. If it still happens, increase the closing count threshold to 4 or 5. You can also add a minimum approach speed requirement — the drone should be moving forward when transit triggers.

Problem: "EMERGENCY triggered too early"

Cause: seek_timeout is too short (default 30s). If gates are far apart, the drone may legitimately take longer than 30s to find the next one.

Fix: Increase seek_timeout. Alternatively, increase approach_distance so the drone enters APPROACH sooner when a distant gate is first detected, keeping the SEEK phase short.

Problem: "FINISHED too early (not all gates done)"

Cause: no_gate_finish_timeout expired (30s after last gate) before the drone found the next gate.

Fix: Increase no_gate_finish_timeout, or set expected_gates to the known gate count so the race doesn't end until all gates are passed.

5. Control & Flight Issues

Problem: "Drone oscillates left-right"

Cause: kp_yaw is too high, causing overcorrection.

Fix: Lower kp_yaw by 20–30% (e.g., 50 → 35). Also check kp_roll — roll coupling with yaw can amplify oscillations.
Tuning tip: Always change one gain at a time. If you change kp_yaw and kp_roll simultaneously, you won't know which fix worked.

Problem: "Drone flies past the gate"

Cause: Too much forward speed, not enough deceleration before the gate.

Fix 1: Increase approach_dist — start slowing down earlier.

Fix 2: Reduce cruise_pitch — less aggressive forward tilt means lower top speed.

Problem: "Drone slowly descends"

Cause: hover_thrust is too low for the vehicle weight.

Fix: Increase hover_thrust by 0.02–0.05 increments.
Test: Put drone in SEEK with no gate visible. It should hold altitude within ±0.3m.

Problem: "Drone climbs uncontrollably"

Cause: hover_thrust is too high.

Fix: Decrease hover_thrust by 0.02–0.05 increments until hover is stable.

Problem: "Drone doesn't turn fast enough to track gate"

Cause: kp_yaw is too low.

Fix: Increase kp_yaw. Also increase kp_roll — banking into turns significantly improves yaw responsiveness and allows tighter tracking at speed.

Problem: "Drone nose-dives into ground"

Cause: cruise_pitch is too aggressive (e.g., -35 degrees).

Safety critical: This can destroy hardware on a real drone. Always test pitch changes in simulation first.

Fix 1: Reduce cruise_pitch to -20 or -15 degrees.

Fix 2: Ensure min_altitude safety check is active — this should override pitch commands if the drone drops below a safe height.

6. Performance Issues

Problem: "Vision FPS too low (< 30fps)"

Low inference framerate means the control loop is working with stale data, causing sluggish or jerky flight.

FixExpected Improvement
Ensure GPU is being used: torch.cuda.is_available() must return True5–10x over CPU
Export model to TensorRT .engine format2–3x over vanilla PyTorch
Reduce input resolution (320x240 is still usable)2–4x depending on original resolution
Use FP16 inference if GPU supports it1.5–2x

Problem: "Command rate below target (< 120Hz)"

Cause: Vision inference is blocking the main control loop.

Fixes:

Problem: "High memory usage"

Cause: Multiple PyTorch models loaded simultaneously (YOLO + U-Net + color detector all in GPU memory).

Fix: Only load the active detector mode. Unload unused models. A single YOLO model uses ~100MB GPU RAM; U-Net is similar. Loading both wastes memory for no benefit.

7. Sim-to-Real Transfer Issues

The sim-to-real gap is real. Here's what typically breaks and how to fix it.

Rule of thumb: If it barely works in sim, it won't work on real hardware. Get comfortable margins in simulation before transferring.
DomainSim vs. Real DifferenceFix
Camera Real cameras have lens distortion, motion blur, auto-exposure variance, rolling shutter Calibrate intrinsics with OpenCV checkerboard. Train/fine-tune detector on real camera footage. Lock exposure if possible.
Latency Real hardware has USB/serial latency (5–20ms additional) Measure actual end-to-end latency. Reduce control gains proportionally. Consider latency-compensating prediction.
Thrust Real drone weight, battery voltage, and prop efficiency differ from sim Re-calibrate hover_thrust on real hardware. Start conservative and increase.
Vibration Real IMU picks up motor/prop vibration, degrading attitude estimates Check vibration levels in QGroundControl. Use soft-mounted flight controller. Balance props.
Lighting Real venues have inconsistent lighting, shadows, reflections Use learned detectors (YOLO/U-Net) rather than color-based. Train on diverse lighting conditions.

8. Log Analysis

The race logger writes JSONL files to race_logs/. Each line is a timestamped snapshot of the race state:

{"t": 1.234, "phase": "approach", "gate_dist": 5.2, "speed": 8.1, "bearing_x": 0.15, "det_count": 1}

Quick Analysis Commands

Detection rate — count frames with zero detections:

grep -c '"det_count": 0' race.jsonl

Phase distribution — see how much time was spent in each state:

grep -o '"phase": "[^"]*"' race.jsonl | sort | uniq -c

Speed over time — extract for plotting:

jq -r '[.t, .speed] | @csv' race.jsonl > speed.csv

What to Look For

MetricHealthy ValueProblem Indicator
Detection rate> 80% of frames< 50% means vision pipeline is struggling
Time in SEEK< 5s per gate> 15s means gates are hard to find or seek rate is wrong
Speed at transit4–12 m/s> 15 m/s risks overshoot; < 2 m/s is too conservative
Bearing error at transit< 0.1 (centered)> 0.3 means drone is clipping the gate edge

9. Frequently Asked Questions

Q: What's the minimum hardware for development?

Any PC with Python 3.8+ can run the sim with a synthetic camera. A GPU (NVIDIA, with CUDA) is only needed for YOLO/U-Net training and fast inference. For development and testing with the color detector, CPU-only is fine.

Q: Can I use a different camera?

Yes. Update CameraSettings in race_config.py to match your camera's resolution and FOV. Run an OpenCV checkerboard calibration to get accurate intrinsic parameters. The vision pipeline is camera-agnostic as long as it receives frames in the expected format.

Q: How do I add a new gate color?

Add a preset in the ColorGateDetector.presets dictionary, or set custom hsv_lower / hsv_upper values in config. Use the debug frame output to verify your HSV ranges capture the gate reliably under different lighting.

Q: Why not use GPS?

Competition rules — GPS is disabled during races. Additionally, indoor racing venues have no GPS signal. Vision-only navigation forces robustness and is more representative of real-world autonomous flight in GPS-denied environments.

Q: Can I run multiple detectors simultaneously?

Not currently — VisionPipeline uses one mode at a time. However, you could implement a cascade: try U-Net first (most accurate), fall back to color detection if U-Net returns no detections. This adds latency for the fallback path but improves reliability.

Q: What's the latency budget?

At 120Hz command rate, you have 8.33ms per control cycle. Typical breakdown:

StageBudget
U-Net inference~5ms
PnP solve + corner extraction~1ms
Control law computation~0.5ms
MAVLink serialize + send~0.5ms
Margin~1.3ms

Tight but achievable. If vision is slower, decouple it from the control loop (see Section 6).

Q: How do I know if my PnP depth is accurate?

Fly toward a gate at a known distance (use sim ground truth or a tape measure). Compare PnP output to ground truth at several distances. Error should be < 10% at distances under 20m. If error is consistently biased (always over or under), check gate dimensions in config. If error is noisy, check corner detection quality.

Q: What if my drone works fine for the first few gates then crashes?

Common causes of mid-race failures:

Q: How do I reset the race mid-flight?

In simulation, kill the race script and restart. The drone will fall (gravity). For a clean reset: land first, disarm, then restart the race pipeline. In PX4 SITL you can also use QGroundControl to switch to manual/position mode before restarting.