Training · RTX 5080 · $0 cost

Local GPU training.

Train all three APEX phases on the RTX 5080 box. One command, ~7.5 hours overnight. The sim-testing environment is the same box (Windows-only AIGP sim), so local beats cloud for this year.

GPU
RTX 5080 · 16 GB GDDR7 · Blackwell
our dev box
Command
python train_apex.py
runs all 3 phases
Total time
~7.5 hours
overnight fits
Constraint
Final validation on Windows
AIGP sim Windows-only
Training vs the AIGP sim. Training runs on this RTX 5080 box against the SimDrone proxy env. The actual AIGP sim (May 2026) is Windows-only and is a separate downloaded package — see submission guide. Training can run on Linux if you want, but the final submission has to be validated on Windows.
For unattended overnight runs use overnight_autotrainer.py nightly (backup · train · benchmark · auto-promote). See the training runbook for the exact commands, the config, and how to schedule it with Task Scheduler / cron.

§ 01GPU compatibility

GPUVRAMPhase 1Phase 2Phase 3 (PPO 10M)Total
RTX 5080 (yours)16 GB GDDR7~2 hr~1.5 hr~4 hr~7.5 hr
RTX 509032 GB~1.5 hr~1 hr~2.5 hr~5 hr
RTX 409024 GB~2.5 hr~1.7 hr~4.5 hr~8.7 hr
RTX 408016 GB~3 hr~2 hr~5 hr~10 hr
RTX 4070 Ti12 GB~4 hr~2.5 hr~6 hr~12.5 hr
RTX 308010 GB~5 hr~3.5 hr~8 hr~16.5 hr

§ 02Install

Windows (target)

PRIMARY
python -m venv aigp
aigp\Scripts\activate

pip install torch torchvision \
  --index-url https://download.pytorch.org/whl/cu128

pip install opencv-python numpy scipy pyyaml \
  ultralytics stable-baselines3 gymnasium onnxruntime

Linux

TRAINING ONLY
python3 -m venv aigp
source aigp/bin/activate

pip install torch torchvision \
  --index-url https://download.pytorch.org/whl/cu128

pip install opencv-python numpy scipy pyyaml \
  ultralytics stable-baselines3 gymnasium onnxruntime

Transport is MAVSDK over UDP per VADR-TS-002 §4 (pip install mavsdk). Vision stream is UDP:5600 JPEG-chunked — handled by UDPVisionCamera in camera_adapter.py. Runtime: Python 3.14.2 known-good per §5.1 (other versions allowed). Windows 11 only; no Linux support.

§ 03Run

# All three phases sequentially (~7.5 hr RTX 5080)
python train_apex.py

# Individual phases
python train_apex.py detector     # Phase 1: YOLO11n (~2 hr)
python train_apex.py keypoints    # Phase 2: YOLO11n-pose (~1.5 hr)
python train_apex.py policy       # Phase 3: PPO (~4 hr)

# Smoke tests
python train_apex.py detector --epochs 5
python train_apex.py policy --steps 500000
Phase 3 observation mode. Default --observation-mode=privileged is dev-only (legacy 24D with NED bearings, does not transfer). For a submission-ready policy:
python train_apex.py policy --steps 10000000 \
  --observation-mode=detector_telemetry
28D obs matches the AIGP sim's input surface (FPV + telemetry, no GPS).

§ 04Outputs

FileModelSizePurpose
models/apex_detector_best.ptYOLO11n~6 MBPhase 1 (VQ1 + VQ2)
models/apex_detector_best.onnxYOLO11n~6 MBONNX for submission
models/apex_keypoints_best.ptYOLO11n-pose~6 MBPhase 2 (VQ1 + VQ2)
models/apex_keypoints_best.onnxYOLO11n-pose~6 MBONNX for submission
output/apex_policy/apex_policy_best.zipPerception-aware PPO~1 MBPhase 3 (VQ2 only)
output/apex_policy/apex_policy.onnxPPO~1 MBONNX-exported for inference
Don't commit model weights to the repo. .gitignore already excludes *.pt, *.onnx, *.pth, rl_models/. Move weights via HF Hub, Git LFS, scp, or the submission package — not git.

§ 05Troubleshooting

ProblemCauseFix
torch.cuda.is_available() == FalseCPU-only PyTorchRe-install with cu128 index URL
CUDA OOM during PPOParallel envs too many for 16 GBDrop n_envs 4 → 2 in train_apex_policy
CUDA illegal-memory-access at frame ~43VRAM exhaustion (we hit this during lingbot)Lower camera_num_iterations, close other GPU apps
FileNotFoundError: train_apex.pyNot in repo dircd grandprix
Training CPU-bound instead of GPUDataLoader workers misconfigurednum_workers=8, pinned memory
GPU temp >85°CSustained compute loadCheck case airflow, drop batch size

§ 06Remote training from Mac

On the PC (one-time)

SETUP
# Enable OpenSSH Server (Windows)
# Settings → Apps → Optional Features
# → Add: OpenSSH Server

# Then (PowerShell admin):
Start-Service sshd
Set-Service -Name sshd -StartupType Automatic

ipconfig  # find IPv4 (e.g. 192.168.1.42)

From Mac

USE
ssh user@192.168.1.42
cd grandprix
python train_apex.py

# Disconnect-safe:
nohup python train_apex.py > training.log 2>&1 &
tail -f training.log

§ 07Local vs cloud cost

RTX 5080 (yours)RunPod A4000RunPod A100
Cost per full run$0~$6~$13
Total time~7.5 hr~17.5 hr~9 hr
Setup10 min (first time)5 min5 min
Sim testingSame boxSeparate Windows boxSeparate Windows box
Best forDefaultNo local GPUMulti-GPU distributed
RTX 5080 matches A100 on time for single-GPU APEX and costs $0. The tiebreaker for AIGP: final submission validates against the Windows-only sim, so you'll be on the 5080 box anyway. Cloud only helps for multi-GPU distributed training, which APEX doesn't need.
LOCAL-GPU · v2.0 2026-04-21 · ← Index · APEX