Skills
3
local catalog + repo
Runs
6
5 success / 1 non-success
Avg Score
0.825
82.5% average composite
<0.41
0.4-0.70
0.7-0.90
0.9+5
Builds
5
5 pass / 0 fail
Live Activity Feed
View all runsB
build-seeded-ui
2/22/2026, 11:20:42 AM (54d ago) • mode=local-validate
true
B
build-d8cc58c1ba89
2/22/2026, 5:40:15 AM (54d ago) • mode=local-validate
true
R
Run run-persist-suite-check-001auto
2/22/2026, 5:38:19 AM (54d ago) • status=success
composite
0.988
R
Run run-persist-check-001auto
2/22/2026, 5:38:18 AM (54d ago) • status=success
composite
0.988
B
build-2c9da4e4645b
2/22/2026, 5:38:18 AM (54d ago) • mode=local-validate
true
R
Run run-final-evaluator-suite-001auto
2/22/2026, 5:32:37 AM (54d ago) • status=success
composite
0.988
R
Run run-evaluator-suite-001auto
2/22/2026, 5:30:58 AM (54d ago) • status=success
composite
0.988
R
Run run-store-test-002auto
2/22/2026, 5:26:45 AM (54d ago) • status=success
composite
0.988
Benchmarks & Leaderboard
System State
Score Composition
Using latest run aggregate scores (or averages) to explain the trajectory-first scoring model shown in the demo.
Outcome1.000
Trajectory0.950
Efficiency1.000
Reliability1.000
Safety Penalty0.000
Composite
0.988