Judge Demo
Skillforest Demo Script
A guided flow for the competition: trigger a benchmark run, compare leaderboard entries, and replay execution trajectories with a polished dashboard shell.
Skills
3
Runs
7
Leaderboard Entries
2
Top Score
0.988
1
1. Open Dashboard
Show live status, Supabase mode, benchmark health, and one-click demo execution.
2
2. Trigger Benchmark Run
Use the dashboard or benchmark controls to run a suite and refresh the leaderboard.
3
3. Compare Top Entries
Use the comparison lab to explain score tradeoffs (trajectory vs efficiency vs safety).
4
4. Replay a Run
Walk through the trajectory playback and show tool calls, file ops, and final outputs.
5
5. Show Governance / Ops
Presentation mode + Supabase probe + deploy readiness demonstrate operational maturity.
Talking Points (30-60s each)
- Trajectory-first evaluation surfaces silent failures that pass/fail-only benchmarks miss.
- Self-contained skill packages make submissions reproducible, auditable, and sandboxable.
- Supabase-backed persistence plus local fallback keeps development velocity high while preserving a path to production.
- Presentation mode lets judges focus on the product experience while keeping debug depth available when asked.