Local FastAPI dashboard for the RTX 5080 training box. Live GPU telemetry (util, VRAM, temp, power), one-click control over overnight_autotrainer.py, live log tail, run history, rollback. Same F1 theme as the docs. Double-click to launch, watch from your phone.
aigp_trainer_app.pylaunch_trainer_app.bat--host 0.0.0.0 · phone on same wifiUtilization %, VRAM used/total, GPU temperature, power draw, clock speed. Streamed at 2 Hz via server-sent events. No polling cost.
Detects running train_apex.py / overnight_autotrainer.py processes. Shows PID, current phase, live log tail.
Start nightly pipeline. Run a single phase. Benchmark. Stop. Rollback to last backup. Each action launches autotrainer in a detached process.
Every overnight_runs/ entry with status, duration, promotion decision. Click for detail — per-phase logs, metrics, summary.md.
# First time only — install FastAPI + uvicorn
pip install fastapi "uvicorn[standard]"
# Launch (double-click on Windows)
launch_trainer_app.bat
# Or from the command line
python aigp_trainer_app.py # localhost only
python aigp_trainer_app.py --host 0.0.0.0 # LAN access for phone
python aigp_trainer_app.py --port 9090 # custom port
python aigp_trainer_app.py --no-browser # skip auto-open
http://localhost:8080 on launch. On Windows this is a single double-click. The app bootstraps FastAPI + uvicorn on first run if missing.
Run on the training box with --host 0.0.0.0, then hit the server from any device on the same wifi:
# On the training PC
python aigp_trainer_app.py --host 0.0.0.0
# Find the PC's IP
ipconfig # Windows
ip addr # Linux
# On your phone browser
http://192.168.1.42:8080
0.0.0.0 bind. Allow "Private networks" only — don't expose to the public internet.
| Route | Purpose |
|---|---|
| / | Dashboard — GPU tiles, controls, live log, recent runs |
| /runs | All runs ever, sortable table |
| /runs/<run_id> | Run detail — per-phase logs, metrics, summary.md, promotion decision |
| GET /api/gpu | JSON GPU stats (one-shot) |
| GET /api/disk | JSON disk usage (free, models/, runs/, recordings/) |
| GET /api/status | JSON training state (pid, current run) |
| GET /api/runs | JSON list of recent runs |
| GET /api/run/{id} | JSON status.json for one run |
| GET /api/log/{run}/{name} | Plain-text log tail |
| GET /api/stream/gpu | Server-sent event stream of GPU stats @ 2 Hz |
| GET /api/stream/status | SSE stream of training state @ 0.33 Hz |
| POST /api/start-nightly | Launch overnight_autotrainer.py nightly detached |
| POST /api/start-phase/{name} | Launch a single phase (detector/keypoints/policy/export/benchmark) |
| POST /api/bench | Benchmark only |
| POST /api/stop | Kill the running training process |
| POST /api/rollback | Restore last backup into models/latest/ |
Browser (any device on LAN)
│
│ HTTP / SSE
▼
┌─────────────────────────────────────────┐
│ aigp_trainer_app.py (FastAPI) │
│ │
│ Pages (inline HTML, _theme.css theme) │
│ Routes (control + data JSON) │
│ SSE streams (GPU, status) │
└──────┬─────────────────┬──────┬──────────┘
│ │ │
│ subprocess │ │ read
│ (detached) │ │
▼ ▼ ▼
overnight_autotrainer.py nvidia-smi output/overnight_runs/
│ (status.json, *.log)
▼
train_apex.py (detector / keypoints / policy)
Zero framework lock-in. Inline HTML + plain fetch + EventSource. No React, no Tailwind, no Webpack. Uses the same _theme.css as the docs so it feels native.
0.0.0.0 on trusted networks. Never expose to the internet without auth. Default localhost-only binding is the safe mode.
If you want the app running whenever you log in:
shell:startup → Enterlaunch_trainer_app.bat into that folderFor always-on behavior (even when nobody's logged in), wrap in a Windows Service via nssm or sc.exe. Not needed unless you're doing multi-user training server duty.
The CLI counterpart: overnight_autotrainer.py subcommands, config knobs, scheduling.
Why we train what we train. Effort budget, reliability math, data pipeline moat.
RTX 5080 install, venv, manual train_apex.py usage, cloud vs local.
What each phase does, observation schemas, reward weights.