Control Panel
Leaderboards / sample
Leaderboard

Sample Benchmark v1 Rankings

Composite ranking with trajectory-first scoring, efficiency, and safety penalties visible per run.

Entries
2
ranked runs
Top Score
0.988
best composite
Average
0.988
mean composite
Top Run Score Shape
Radar view of the current #1 run across scoring components (safety shown as inverse penalty).
OutcomeTrajectoryEfficiencyReliabilitySafetyOutcome: 1.000Trajectory: 0.950Efficiency: 1.000Reliability: 1.000Safety: 1.000
Top run score shape
Composite Score Curve
Rank-ordered composite scores
Top
0.988
Avg
0.988
Spread
0.000
Point 1: 0.988Point 1: 0.988Point 2: 0.988Point 2: 0.988
Composite by rank
Raw Leaderboard JSON
{
  "benchmarkId": "sample",
  "benchmark_version_id": "benchmark-v1",
  "dedupe_mode": "best-per-skill-version",
  "item_count": 2,
  "items": [
    {
      "run_id": "run-store-test-002",
      "skill_version_id": "skillver-sample-v0.1.0",
      "benchmark_version_id": "benchmark-v1",
      "composite_score": 0.9875,
      "outcome_score": 1,
      "trajectory_score": 0.95,
      "efficiency_score": 1,
      "safety_penalty": 0,
      "artifact_path": "benchmarks/v1/artifacts/generated-run-store2/run-result.json",
      "rank": 1
    },
    {
      "run_id": "run-verify-suite-1771760806--task-002",
      "skill_version_id": "skillver-local",
      "benchmark_version_id": "benchmark-v1",
      "composite_score": 0.9875,
      "outcome_score": 1,
      "trajectory_score": 0.95,
      "efficiency_score": 1,
      "safety_penalty": 0,
      "artifact_path": "benchmarks/v1/artifacts/run-verify-suite-1771760806/tasks/sample-task/run-result.json",
      "rank": 2
    }
  ]
}