Table of Contents
Fetching ...

BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation

Yulu Pan, Ce Zhang, Gedas Bertasius

TL;DR

Basket introduces a large-scale, long-form basketball video dataset for fine-grained skill estimation, comprising 4,477 hours and 32,232 players across 21 leagues and six seasons. The task requires predicting 20 fine-grained skill levels on a five-point scale from 8–10 minute player highlights, demanding long-range temporal understanding and implicit player identification. Comprehensive experiments show current state-of-the-art video models perform poorly (max around 28.5% accuracy) compared with human experts (up to 72%), with notable gaps in cross-season and cross-league generalization. The work provides extensive dataset details, ablations, and human studies, arguing that Basket enables development of truly long-range, fine-grained skill models and holds potential for fair scouting and personalized player development tools.

Abstract

We present BASKET, a large-scale basketball video dataset for fine-grained skill estimation. BASKET contains 4,477 hours of video capturing 32,232 basketball players from all over the world. Compared to prior skill estimation datasets, our dataset includes a massive number of skilled participants with unprecedented diversity in terms of gender, age, skill level, geographical location, etc. BASKET includes 20 fine-grained basketball skills, challenging modern video recognition models to capture the intricate nuances of player skill through in-depth video analysis. Given a long highlight video (8-10 minutes) of a particular player, the model needs to predict the skill level (e.g., excellent, good, average, fair, poor) for each of the 20 basketball skills. Our empirical analysis reveals that the current state-of-the-art video models struggle with this task, significantly lagging behind the human baseline. We believe that BASKET could be a useful resource for developing new video models with advanced long-range, fine-grained recognition capabilities. In addition, we hope that our dataset will be useful for domain-specific applications such as fair basketball scouting, personalized player development, and many others. Dataset and code are available at https://github.com/yulupan00/BASKET.

BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation

TL;DR

Basket introduces a large-scale, long-form basketball video dataset for fine-grained skill estimation, comprising 4,477 hours and 32,232 players across 21 leagues and six seasons. The task requires predicting 20 fine-grained skill levels on a five-point scale from 8–10 minute player highlights, demanding long-range temporal understanding and implicit player identification. Comprehensive experiments show current state-of-the-art video models perform poorly (max around 28.5% accuracy) compared with human experts (up to 72%), with notable gaps in cross-season and cross-league generalization. The work provides extensive dataset details, ablations, and human studies, arguing that Basket enables development of truly long-range, fine-grained skill models and holds potential for fair scouting and personalized player development tools.

Abstract

We present BASKET, a large-scale basketball video dataset for fine-grained skill estimation. BASKET contains 4,477 hours of video capturing 32,232 basketball players from all over the world. Compared to prior skill estimation datasets, our dataset includes a massive number of skilled participants with unprecedented diversity in terms of gender, age, skill level, geographical location, etc. BASKET includes 20 fine-grained basketball skills, challenging modern video recognition models to capture the intricate nuances of player skill through in-depth video analysis. Given a long highlight video (8-10 minutes) of a particular player, the model needs to predict the skill level (e.g., excellent, good, average, fair, poor) for each of the 20 basketball skills. Our empirical analysis reveals that the current state-of-the-art video models struggle with this task, significantly lagging behind the human baseline. We believe that BASKET could be a useful resource for developing new video models with advanced long-range, fine-grained recognition capabilities. In addition, we hope that our dataset will be useful for domain-specific applications such as fair basketball scouting, personalized player development, and many others. Dataset and code are available at https://github.com/yulupan00/BASKET.

Paper Structure

This paper contains 18 sections, 7 figures, 12 tables.

Figures (7)

  • Figure 1: An illustration of our fine-grained skill estimation task. Given a long highlight video (8-10 minutes in length) that captures many plays of a particular player, the model needs to predict the skill level for 20 fine-grained basketball skills (e.g., three-point shooting, rebounding, passing, etc.). Each skill is rated on a 5-level scale, from "Poor" to "Excellent."
  • Figure 2: Basket is a large-scale video dataset containing 4,477 hours of video and capturing 32,232 basketball players from 21 basketball leagues worldwide. Here, we showcase geographic location diversity of our dataset, i.e., it captures basketball players from 4 continents and more than 30 countries. Each pin marks the approximate geographic location of the visualized basketball game (the color of a pin corresponding to the border of the visualized game).
  • Figure 3: Our Basket dataset covers five coarse basketball skill categories and twenty fine-grained skills, focusing on the evaluation of multi-faceted skill understanding of basketball players.
  • Figure 4: Visualizing some of the players from Basket dataset. Our dataset offers unprecedented player diversity in terms of player nationality, age, gender, race, experience, and skill. The left side of each profile card displays the season, player nationality, league, and club. Skill levels are derived by averaging finer-level skills within each coarse category (as described in Section \ref{['sec:basket']}). SHO: Shooting, DEF: Defense, OFF: Offense, PLA: Playmaking, REB: Rebounding
  • Figure 5: We visualize our human study results on Basket, with subjects grouped by their expertise level (i.e., novice, average, expert). For each group, we visualize the mean accuracy and the min/max ranges. The blue dashed line indicates the performance of our best model, VideoMamba li2024videomamba. To ensure that the time needed to complete the study is reasonable, every subject is asked to watch videos of 5 uniformly selected players and classify 5 selected skills into 3 skill levels (i.e., "Poor", "Average," and "Excellent"). Our VideoMamba baseline, which was not trained on these players, is also tested in this exact setting. Our results highlight the gap between model and human performance, especially for the human subjects with high expertise.
  • ...and 2 more figures