Training · RTX 5080 · $0 cost
Local GPU training.
Train all three APEX phases on the RTX 5080 box. One command, ~7.5 hours overnight. The sim-testing environment is the same box (Windows-only AIGP sim), so local beats cloud for this year.
GPU
RTX 5080 · 16 GB GDDR7 · Blackwell
our dev box
Command
python train_apex.py
runs all 3 phases
Total time
~7.5 hours
overnight fits
Constraint
Final validation on Windows
AIGP sim Windows-only
Training vs the AIGP sim. Training runs on this RTX 5080 box against the
SimDrone proxy env. The actual AIGP sim (May 2026) is
Windows-only and is a separate downloaded package — see
submission guide. Training can run on Linux if you want, but the final submission has to be validated on Windows.
For unattended overnight runs use
overnight_autotrainer.py nightly (backup · train · benchmark · auto-promote). See the
training runbook for the exact commands, the config, and how to schedule it with Task Scheduler / cron.
§ 01GPU compatibility
| GPU | VRAM | Phase 1 | Phase 2 | Phase 3 (PPO 10M) | Total |
| RTX 5080 (yours) | 16 GB GDDR7 | ~2 hr | ~1.5 hr | ~4 hr | ~7.5 hr |
| RTX 5090 | 32 GB | ~1.5 hr | ~1 hr | ~2.5 hr | ~5 hr |
| RTX 4090 | 24 GB | ~2.5 hr | ~1.7 hr | ~4.5 hr | ~8.7 hr |
| RTX 4080 | 16 GB | ~3 hr | ~2 hr | ~5 hr | ~10 hr |
| RTX 4070 Ti | 12 GB | ~4 hr | ~2.5 hr | ~6 hr | ~12.5 hr |
| RTX 3080 | 10 GB | ~5 hr | ~3.5 hr | ~8 hr | ~16.5 hr |
§ 02Install
Windows (target)
PRIMARY
python -m venv aigp
aigp\Scripts\activate
pip install torch torchvision \
--index-url https://download.pytorch.org/whl/cu128
pip install opencv-python numpy scipy pyyaml \
ultralytics stable-baselines3 gymnasium onnxruntime
Linux
TRAINING ONLY
python3 -m venv aigp
source aigp/bin/activate
pip install torch torchvision \
--index-url https://download.pytorch.org/whl/cu128
pip install opencv-python numpy scipy pyyaml \
ultralytics stable-baselines3 gymnasium onnxruntime
Transport is MAVSDK over UDP per VADR-TS-002 §4 (pip install mavsdk). Vision stream is UDP:5600 JPEG-chunked — handled by UDPVisionCamera in camera_adapter.py. Runtime: Python 3.14.2 known-good per §5.1 (other versions allowed). Windows 11 only; no Linux support.
§ 03Run
# All three phases sequentially (~7.5 hr RTX 5080)
python train_apex.py
# Individual phases
python train_apex.py detector # Phase 1: YOLO11n (~2 hr)
python train_apex.py keypoints # Phase 2: YOLO11n-pose (~1.5 hr)
python train_apex.py policy # Phase 3: PPO (~4 hr)
# Smoke tests
python train_apex.py detector --epochs 5
python train_apex.py policy --steps 500000
Phase 3 observation mode. Default
--observation-mode=privileged is dev-only (legacy 24D with NED bearings, does
not transfer). For a submission-ready policy:
python train_apex.py policy --steps 10000000 \
--observation-mode=detector_telemetry
28D obs matches the AIGP sim's input surface (FPV + telemetry, no GPS).
§ 04Outputs
| File | Model | Size | Purpose |
models/apex_detector_best.pt | YOLO11n | ~6 MB | Phase 1 (VQ1 + VQ2) |
models/apex_detector_best.onnx | YOLO11n | ~6 MB | ONNX for submission |
models/apex_keypoints_best.pt | YOLO11n-pose | ~6 MB | Phase 2 (VQ1 + VQ2) |
models/apex_keypoints_best.onnx | YOLO11n-pose | ~6 MB | ONNX for submission |
output/apex_policy/apex_policy_best.zip | Perception-aware PPO | ~1 MB | Phase 3 (VQ2 only) |
output/apex_policy/apex_policy.onnx | PPO | ~1 MB | ONNX-exported for inference |
Don't commit model weights to the repo. .gitignore already excludes *.pt, *.onnx, *.pth, rl_models/. Move weights via HF Hub, Git LFS, scp, or the submission package — not git.
§ 05Troubleshooting
| Problem | Cause | Fix |
torch.cuda.is_available() == False | CPU-only PyTorch | Re-install with cu128 index URL |
| CUDA OOM during PPO | Parallel envs too many for 16 GB | Drop n_envs 4 → 2 in train_apex_policy |
| CUDA illegal-memory-access at frame ~43 | VRAM exhaustion (we hit this during lingbot) | Lower camera_num_iterations, close other GPU apps |
FileNotFoundError: train_apex.py | Not in repo dir | cd grandprix |
| Training CPU-bound instead of GPU | DataLoader workers misconfigured | num_workers=8, pinned memory |
| GPU temp >85°C | Sustained compute load | Check case airflow, drop batch size |
§ 06Remote training from Mac
On the PC (one-time)
SETUP
# Enable OpenSSH Server (Windows)
# Settings → Apps → Optional Features
# → Add: OpenSSH Server
# Then (PowerShell admin):
Start-Service sshd
Set-Service -Name sshd -StartupType Automatic
ipconfig # find IPv4 (e.g. 192.168.1.42)
From Mac
USE
ssh user@192.168.1.42
cd grandprix
python train_apex.py
# Disconnect-safe:
nohup python train_apex.py > training.log 2>&1 &
tail -f training.log
§ 07Local vs cloud cost
| RTX 5080 (yours) | RunPod A4000 | RunPod A100 |
| Cost per full run | $0 | ~$6 | ~$13 |
| Total time | ~7.5 hr | ~17.5 hr | ~9 hr |
| Setup | 10 min (first time) | 5 min | 5 min |
| Sim testing | Same box | Separate Windows box | Separate Windows box |
| Best for | Default | No local GPU | Multi-GPU distributed |
RTX 5080 matches A100 on time for single-GPU APEX and costs $0. The tiebreaker for AIGP: final submission validates against the Windows-only sim, so you'll be on the 5080 box anyway. Cloud only helps for multi-GPU distributed training, which APEX doesn't need.