Game State and Spatio-temporal Action Detection in Soccer using Graph Neural Networks and 3D Convolutional Networks
Jeremie Ochin, Guillaume Devineau, Bogdan Stanciulescu, Sotiris Manitsaris
TL;DR
This paper tackles spatio-temporal action detection in soccer by integrating explicit game-state information with visual cues. It introduces a Graph Neural Network to encode the local game state and fuses its embeddings with a state-of-the-art 3D CNN (X3D) within a TAAD-based framework to enhance event detection. Experiments on the Footovision ball-events dataset show that TAAD+GNN improves detection performance, particularly in high-recall regimes, and reduces false positives by leveraging inter-player relationships and team dynamics. The work highlights the practical value for soccer analytics and points to future directions for longer-term prediction and richer game-state features to further boost performance in challenging conditions like occlusion.
Abstract
Soccer analytics rely on two data sources: the player positions on the pitch and the sequences of events they perform. With around 2000 ball events per game, their precise and exhaustive annotation based on a monocular video stream remains a tedious and costly manual task. While state-of-the-art spatio-temporal action detection methods show promise for automating this task, they lack contextual understanding of the game. Assuming professional players' behaviors are interdependent, we hypothesize that incorporating surrounding players' information such as positions, velocity and team membership can enhance purely visual predictions. We propose a spatio-temporal action detection approach that combines visual and game state information via Graph Neural Networks trained end-to-end with state-of-the-art 3D CNNs, demonstrating improved metrics through game state integration.
