Towards Active Learning for Action Spotting in Association Football Videos
Silvio Giancola, Anthony Cioppa, Julia Georgieva, Johsan Billingham, Andreas Serner, Kerry Peek, Bernard Ghanem, Marc Van Droogenbroeck
TL;DR
The paper presents an active learning framework for action spotting in football videos to reduce annotation cost and accelerate training. By leveraging uncertainty sampling (Uncertainty Measure and Entropy Measure), the method selectively annotates the most informative clips and iteratively trains action spotting models (e.g., NetVLAD++ and PTS) on progressively enriched data, achieving data-efficient performance on SoccerNet-v2 and two additional datasets. Key contributions include formalizing the first active learning workflow for action spotting, comparing sampling strategies, and introducing accelerations (adaptive scheduling, faster training, continual fine-tuning) that maintain performance. The approach promises practical impact by shrinking annotation labor and speeding up deployment of robust action-spotting systems in sports analytics.
Abstract
Association football is a complex and dynamic sport, with numerous actions occurring simultaneously in each game. Analyzing football videos is challenging and requires identifying subtle and diverse spatio-temporal patterns. Despite recent advances in computer vision, current algorithms still face significant challenges when learning from limited annotated data, lowering their performance in detecting these patterns. In this paper, we propose an active learning framework that selects the most informative video samples to be annotated next, thus drastically reducing the annotation effort and accelerating the training of action spotting models to reach the highest accuracy at a faster pace. Our approach leverages the notion of uncertainty sampling to select the most challenging video clips to train on next, hastening the learning process of the algorithm. We demonstrate that our proposed active learning framework effectively reduces the required training data for accurate action spotting in football videos. We achieve similar performances for action spotting with NetVLAD++ on SoccerNet-v2, using only one-third of the dataset, indicating significant capabilities for reducing annotation time and improving data efficiency. We further validate our approach on two new datasets that focus on temporally localizing actions of headers and passes, proving its effectiveness across different action semantics in football. We believe our active learning framework for action spotting would support further applications of action spotting algorithms and accelerate annotation campaigns in the sports domain.
