Control Panel
Compare
Comparison Lab

Benchmark Entry Comparison

Side-by-side score shape comparison for top leaderboard entries on the sample benchmark. Useful for judge demos and model tradeoff discussion.

Entries
2
Top
0.988
Second
0.988
Delta
0.000
Entry 1
skillver-sample-v0.1.0
run-store-test-002
Composite
0.988
OutcomeTrajectoryEfficiencyReliabilitySafetyOutcome: 1.000Trajectory: 0.950Efficiency: 1.000Reliability: 1.000Safety: 1.000
Entry A
Entry 2
skillver-local
run-verify-suite-1771760806--task-002
Composite
0.988
OutcomeTrajectoryEfficiencyReliabilitySafetyOutcome: 1.000Trajectory: 0.950Efficiency: 1.000Reliability: 1.000Safety: 1.000
Entry B
Composite Score Distribution
Rank-ordered scores across leaderboard entries
Point 1: 0.988Point 1: 0.988Point 2: 0.988Point 2: 0.988
Composite by rank
Raw Comparison Source
{
  "benchmarkId": "sample",
  "benchmark_version_id": "benchmark-v1",
  "dedupe_mode": "best-per-skill-version",
  "item_count": 2,
  "items": [
    {
      "run_id": "run-store-test-002",
      "skill_version_id": "skillver-sample-v0.1.0",
      "benchmark_version_id": "benchmark-v1",
      "composite_score": 0.9875,
      "outcome_score": 1,
      "trajectory_score": 0.95,
      "efficiency_score": 1,
      "safety_penalty": 0,
      "artifact_path": "benchmarks/v1/artifacts/generated-run-store2/run-result.json",
      "rank": 1
    },
    {
      "run_id": "run-verify-suite-1771760806--task-002",
      "skill_version_id": "skillver-local",
      "benchmark_version_id": "benchmark-v1",
      "composite_score": 0.9875,
      "outcome_score": 1,
      "trajectory_score": 0.95,
      "efficiency_score": 1,
      "safety_penalty": 0,
      "artifact_path": "benchmarks/v1/artifacts/run-verify-suite-1771760806/tasks/sample-task/run-result.json",
      "rank": 2
    }
  ]
}