Control Panel
Demo
Judge Demo

Skillforest Demo Script

A guided flow for the competition: trigger a benchmark run, compare leaderboard entries, and replay execution trajectories with a polished dashboard shell.

Skills
3
Runs
7
Leaderboard Entries
2
Top Score
0.988
1
1. Open Dashboard
Show live status, Supabase mode, benchmark health, and one-click demo execution.
Open Dashboard
2
2. Trigger Benchmark Run
Use the dashboard or benchmark controls to run a suite and refresh the leaderboard.
Open Benchmark Controls
3
3. Compare Top Entries
Use the comparison lab to explain score tradeoffs (trajectory vs efficiency vs safety).
Open Compare Lab
4
4. Replay a Run
Walk through the trajectory playback and show tool calls, file ops, and final outputs.
Open Latest Run Replay
5
5. Show Governance / Ops
Presentation mode + Supabase probe + deploy readiness demonstrate operational maturity.
Open Supabase Ops
Talking Points (30-60s each)
  • Trajectory-first evaluation surfaces silent failures that pass/fail-only benchmarks miss.
  • Self-contained skill packages make submissions reproducible, auditable, and sandboxable.
  • Supabase-backed persistence plus local fallback keeps development velocity high while preserving a path to production.
  • Presentation mode lets judges focus on the product experience while keeping debug depth available when asked.