Comparison Lab
Benchmark Entry Comparison
Side-by-side score shape comparison for top leaderboard entries on the sample benchmark. Useful for judge demos and model tradeoff discussion.
Entries
2
Top
0.988
Second
0.988
Delta
0.000
Entry 1
run-store-test-002
Composite
0.988
Entry A
Entry 2
run-verify-suite-1771760806--task-002
Composite
0.988
Entry B
Composite Score Distribution
Rank-ordered scores across leaderboard entries
Composite by rank
Raw Comparison Source
{
"benchmarkId": "sample",
"benchmark_version_id": "benchmark-v1",
"dedupe_mode": "best-per-skill-version",
"item_count": 2,
"items": [
{
"run_id": "run-store-test-002",
"skill_version_id": "skillver-sample-v0.1.0",
"benchmark_version_id": "benchmark-v1",
"composite_score": 0.9875,
"outcome_score": 1,
"trajectory_score": 0.95,
"efficiency_score": 1,
"safety_penalty": 0,
"artifact_path": "benchmarks/v1/artifacts/generated-run-store2/run-result.json",
"rank": 1
},
{
"run_id": "run-verify-suite-1771760806--task-002",
"skill_version_id": "skillver-local",
"benchmark_version_id": "benchmark-v1",
"composite_score": 0.9875,
"outcome_score": 1,
"trajectory_score": 0.95,
"efficiency_score": 1,
"safety_penalty": 0,
"artifact_path": "benchmarks/v1/artifacts/run-verify-suite-1771760806/tasks/sample-task/run-result.json",
"rank": 2
}
]
}