Research Benchmark Interface
Coachbench logo and wordmark
3,863 items · 25 sports · 11 models · 18 settings

Overview

Automatically parses the JSONL files and summarizes overall accuracy, subgroup performance, and high-miss examples.

Loading dataset...

Accuracy Curve

A horizontal comparison of overall model accuracy with a localized axis to make small gaps easier to read.

Computing model scores...
Loading model filters...

3D Radar View

Shows a multidimensional comparison of leading models across overall accuracy and mid-level class aggregates.

Computing multidimensional view...

Breakdown Heatmap

Switch between class and sport to locate where each model gains or loses accuracy.

Loading grouped metrics...

Leaderboard

Ranks models by overall accuracy and distinguishes direct, thinking, and search-augmented variants.

Waiting for benchmark files...