Table of Contents
Fetching ...

Tracking Skiers from the Top to the Bottom

Matteo Dunnhofer, Luca Sordi, Niki Martinel, Christian Micheloni

TL;DR

This paper tackles the challenge of tracking skiers across an entire performance in monocular multi-camera broadcasts, addressing a gap in skiing-focused benchmarks. It introduces SkiTB, a large, densely annotated dataset designed for comprehensive per-frame skier localization and multi-camera analysis, along with a skier-optimized baseline tracker (STARK_SKI) and a fine-tuned STARK (STARK_FT). Through extensive experiments comparing generic trackers and skier-specific approaches, the study shows that domain-specific trackers yield substantial gains in localization accuracy and improve downstream 2D pose estimation (SkiPosePTZ), under diverse conditions and splits (new performances, unseen athletes, unseen courses). The results highlight the practical potential of vision-based skiing analytics for performance understanding and broadcasting, while also identifying remaining challenges such as cross-camera continuity, occlusions, small appearance, and efficiency. Overall, SkiTB enables robust evaluation of tracking methods in skiing and guides future work toward better generalization and real-time applicability in high-level analysis pipelines.

Abstract

Skiing is a popular winter sport discipline with a long history of competitive events. In this domain, computer vision has the potential to enhance the understanding of athletes' performance, but its application lags behind other sports due to limited studies and datasets. This paper makes a step forward in filling such gaps. A thorough investigation is performed on the task of skier tracking in a video capturing his/her complete performance. Obtaining continuous and accurate skier localization is preemptive for further higher-level performance analyses. To enable the study, the largest and most annotated dataset for computer vision in skiing, SkiTB, is introduced. Several visual object tracking algorithms, including both established methodologies and a newly introduced skier-optimized baseline algorithm, are tested using the dataset. The results provide valuable insights into the applicability of different tracking methods for vision-based skiing analysis. SkiTB, code, and results are available at https://machinelearning.uniud.it/datasets/skitb.

Tracking Skiers from the Top to the Bottom

TL;DR

This paper tackles the challenge of tracking skiers across an entire performance in monocular multi-camera broadcasts, addressing a gap in skiing-focused benchmarks. It introduces SkiTB, a large, densely annotated dataset designed for comprehensive per-frame skier localization and multi-camera analysis, along with a skier-optimized baseline tracker (STARK_SKI) and a fine-tuned STARK (STARK_FT). Through extensive experiments comparing generic trackers and skier-specific approaches, the study shows that domain-specific trackers yield substantial gains in localization accuracy and improve downstream 2D pose estimation (SkiPosePTZ), under diverse conditions and splits (new performances, unseen athletes, unseen courses). The results highlight the practical potential of vision-based skiing analytics for performance understanding and broadcasting, while also identifying remaining challenges such as cross-camera continuity, occlusions, small appearance, and efficiency. Overall, SkiTB enables robust evaluation of tracking methods in skiing and guides future work toward better generalization and real-time applicability in high-level analysis pipelines.

Abstract

Skiing is a popular winter sport discipline with a long history of competitive events. In this domain, computer vision has the potential to enhance the understanding of athletes' performance, but its application lags behind other sports due to limited studies and datasets. This paper makes a step forward in filling such gaps. A thorough investigation is performed on the task of skier tracking in a video capturing his/her complete performance. Obtaining continuous and accurate skier localization is preemptive for further higher-level performance analyses. To enable the study, the largest and most annotated dataset for computer vision in skiing, SkiTB, is introduced. Several visual object tracking algorithms, including both established methodologies and a newly introduced skier-optimized baseline algorithm, are tested using the dataset. The results provide valuable insights into the applicability of different tracking methods for vision-based skiing analysis. SkiTB, code, and results are available at https://machinelearning.uniud.it/datasets/skitb.
Paper Structure (42 sections, 9 figures, 9 tables, 1 algorithm)

This paper contains 42 sections, 9 figures, 9 tables, 1 algorithm.

Figures (9)

  • Figure 1: Tracking a skier from the top to the bottom of the course. This paper focuses on applying visual object tracking algorithms to localize a skier per-frame (e.g. with bounding-boxes $\textcolor{#4689CC}{\square}$) in a video capturing his/her complete performance. Due to the large spatial extent of skiing courses, multiple cameras (typically pan-tilt-zoom) are placed sequentially along the slope to capture the whole performance and multi-camera tracking is required for high-level performance analysis.
  • Figure 2: Frame and bounding-box samples from SkiTB. We showcase examples of video frames from our dataset for the different disciplines: alpine skiing (AL), ski jumping (JP), and freestyle skiing (FS). Each frame is accompanied by a manually annotated bounding-box. A blue rectangle ($\textcolor{#4689CC}{\square}$) localizes the skier's appearance as visible, while a black rectangle ($\textcolor{black}{\square}$) as occluded. The camera that captured the frame and the elapsed time in seconds from the beginning of the performance are also reported.
  • Figure 3: Qualitative tracking performance. This figure shows bounding-box samples predicted by the top four trackers for frames of SkiTB's test set. STARK$_{\texttt{FT}}$ and STARK$_{\texttt{SKI}}$ exhibit high precision in localizing both the skier's body and equipment. Videos with results can be visualized at this link: https://www.youtube.com/watch?v=Aos5iKrYM5o.
  • Figure 4: Fraction of consistent skier tracking starting from the top. This plot depicts the average fraction of consecutive frames in which the target skier is accurately localized before losing track, measured as the GSR score dunnhofer2023visual. Various time thresholds in seconds are employed to assess the trackers' ability to recover from failures over time VOT2020.
  • Figure 5: Waiting time to obtain skier localizations. The plot illustrates at various fractions of an MC sequence the average time that has to be waited to get the bounding-boxes from the trackers. YOLO-SORT demonstrates the highest efficiency, with minimal delay compared to the actual happening of the skiing performance.
  • ...and 4 more figures