Skillforest Kiosk
Agent Skill Evaluation Arena
Trajectory-first benchmarking • self-contained skill packages • live leaderboard demo
Leaderboard Entries
Top Composite
Latest Run
Runs (total)
Composite Score Curve
Rank-ordered leaderboard scores
Composite by rank
Top Entry Score Shape
Outcome / trajectory / efficiency / reliability / safety
Top entry
Judge Flow
1) Trigger run on dashboard → 2) Compare scores → 3) Replay trajectory → 4) Show sandbox + persistence ops