Table of Contents
Fetching ...

Enhanced Transformer-Based Tracking for Skiing Events: Overcoming Multi-Camera Challenges, Scale Variations and Rapid Motion -- SkiTB Visual Tracking Challenge 2025

Akhil Penta, Vaibhav Adwani, Ankush Chopra

TL;DR

This paper addresses robust skier tracking in multi-camera skiing scenarios by adapting the STARK transformer-based tracker to SkiTB, a dense SOT benchmark with diverse disciplines. The authors introduce two key modifications—dynamic search factor and Incremental Template Update (ITU)—to handle camera transitions, rapid motion, and occlusions, along with a hierarchical score head and refined inference settings. Fine-tuning on SkiTB substantially improves performance over a GOT-10k pretrained baseline, with the proposed adaptive approach achieving the best overall F1, precision, and recall across disciplines, though Freestyle sequences remain challenging due to identity switches. The work demonstrates the value of domain-specific adaptation for transformer-based trackers in sports analytics and points to future directions in computational efficiency and alternative tracking paradigms for real-time deployment.

Abstract

Accurate skier tracking is essential for performance analysis, injury prevention, and optimizing training strategies in alpine sports. Traditional tracking methods often struggle with occlusions, dynamic movements, and varying environmental conditions, limiting their effectiveness. In this work, we used STARK (Spatio-Temporal Transformer Network for Visual Tracking), a transformer-based model, to track skiers. We adapted STARK to address domain-specific challenges such as camera movements, camera changes, occlusions, etc. by optimizing the model's architecture and hyperparameters to better suit the dataset.

Enhanced Transformer-Based Tracking for Skiing Events: Overcoming Multi-Camera Challenges, Scale Variations and Rapid Motion -- SkiTB Visual Tracking Challenge 2025

TL;DR

This paper addresses robust skier tracking in multi-camera skiing scenarios by adapting the STARK transformer-based tracker to SkiTB, a dense SOT benchmark with diverse disciplines. The authors introduce two key modifications—dynamic search factor and Incremental Template Update (ITU)—to handle camera transitions, rapid motion, and occlusions, along with a hierarchical score head and refined inference settings. Fine-tuning on SkiTB substantially improves performance over a GOT-10k pretrained baseline, with the proposed adaptive approach achieving the best overall F1, precision, and recall across disciplines, though Freestyle sequences remain challenging due to identity switches. The work demonstrates the value of domain-specific adaptation for transformer-based trackers in sports analytics and points to future directions in computational efficiency and alternative tracking paradigms for real-time deployment.

Abstract

Accurate skier tracking is essential for performance analysis, injury prevention, and optimizing training strategies in alpine sports. Traditional tracking methods often struggle with occlusions, dynamic movements, and varying environmental conditions, limiting their effectiveness. In this work, we used STARK (Spatio-Temporal Transformer Network for Visual Tracking), a transformer-based model, to track skiers. We adapted STARK to address domain-specific challenges such as camera movements, camera changes, occlusions, etc. by optimizing the model's architecture and hyperparameters to better suit the dataset.

Paper Structure

This paper contains 16 sections, 1 equation, 6 tables.