Leaderboard
Sample Benchmark v1 Rankings
Composite ranking with trajectory-first scoring, efficiency, and safety penalties visible per run.
Entries
2
ranked runs
Top Score
0.988
best composite
Average
0.988
mean composite
Top Run Score Shape
Radar view of the current #1 run across scoring components (safety shown as inverse penalty).
Top run score shape
Composite Score Curve
Rank-ordered composite scores
Top
0.988
Avg
0.988
Spread
0.000
Composite by rank
Raw Leaderboard JSON
{
"benchmarkId": "sample",
"benchmark_version_id": "benchmark-v1",
"dedupe_mode": "best-per-skill-version",
"item_count": 2,
"items": [
{
"run_id": "run-store-test-002",
"skill_version_id": "skillver-sample-v0.1.0",
"benchmark_version_id": "benchmark-v1",
"composite_score": 0.9875,
"outcome_score": 1,
"trajectory_score": 0.95,
"efficiency_score": 1,
"safety_penalty": 0,
"artifact_path": "benchmarks/v1/artifacts/generated-run-store2/run-result.json",
"rank": 1
},
{
"run_id": "run-verify-suite-1771760806--task-002",
"skill_version_id": "skillver-local",
"benchmark_version_id": "benchmark-v1",
"composite_score": 0.9875,
"outcome_score": 1,
"trajectory_score": 0.95,
"efficiency_score": 1,
"safety_penalty": 0,
"artifact_path": "benchmarks/v1/artifacts/run-verify-suite-1771760806/tasks/sample-task/run-result.json",
"rank": 2
}
]
}