Table of Contents
Fetching ...

Game State and Spatio-temporal Action Detection in Soccer using Graph Neural Networks and 3D Convolutional Networks

Jeremie Ochin, Guillaume Devineau, Bogdan Stanciulescu, Sotiris Manitsaris

TL;DR

This paper tackles spatio-temporal action detection in soccer by integrating explicit game-state information with visual cues. It introduces a Graph Neural Network to encode the local game state and fuses its embeddings with a state-of-the-art 3D CNN (X3D) within a TAAD-based framework to enhance event detection. Experiments on the Footovision ball-events dataset show that TAAD+GNN improves detection performance, particularly in high-recall regimes, and reduces false positives by leveraging inter-player relationships and team dynamics. The work highlights the practical value for soccer analytics and points to future directions for longer-term prediction and richer game-state features to further boost performance in challenging conditions like occlusion.

Abstract

Soccer analytics rely on two data sources: the player positions on the pitch and the sequences of events they perform. With around 2000 ball events per game, their precise and exhaustive annotation based on a monocular video stream remains a tedious and costly manual task. While state-of-the-art spatio-temporal action detection methods show promise for automating this task, they lack contextual understanding of the game. Assuming professional players' behaviors are interdependent, we hypothesize that incorporating surrounding players' information such as positions, velocity and team membership can enhance purely visual predictions. We propose a spatio-temporal action detection approach that combines visual and game state information via Graph Neural Networks trained end-to-end with state-of-the-art 3D CNNs, demonstrating improved metrics through game state integration.

Game State and Spatio-temporal Action Detection in Soccer using Graph Neural Networks and 3D Convolutional Networks

TL;DR

This paper tackles spatio-temporal action detection in soccer by integrating explicit game-state information with visual cues. It introduces a Graph Neural Network to encode the local game state and fuses its embeddings with a state-of-the-art 3D CNN (X3D) within a TAAD-based framework to enhance event detection. Experiments on the Footovision ball-events dataset show that TAAD+GNN improves detection performance, particularly in high-recall regimes, and reduces false positives by leveraging inter-player relationships and team dynamics. The work highlights the practical value for soccer analytics and points to future directions for longer-term prediction and richer game-state features to further boost performance in challenging conditions like occlusion.

Abstract

Soccer analytics rely on two data sources: the player positions on the pitch and the sequences of events they perform. With around 2000 ball events per game, their precise and exhaustive annotation based on a monocular video stream remains a tedious and costly manual task. While state-of-the-art spatio-temporal action detection methods show promise for automating this task, they lack contextual understanding of the game. Assuming professional players' behaviors are interdependent, we hypothesize that incorporating surrounding players' information such as positions, velocity and team membership can enhance purely visual predictions. We propose a spatio-temporal action detection approach that combines visual and game state information via Graph Neural Networks trained end-to-end with state-of-the-art 3D CNNs, demonstrating improved metrics through game state integration.

Paper Structure

This paper contains 28 sections, 3 equations, 6 figures.

Figures (6)

  • Figure 1: Spatio-temporal Action Detection in soccer, using a Graph Neural Network to encode Local Game State and a Track-Aware Action Detector Singh2022 to extract relevant visual features. Our method demonstrates the complementarity of game state information and visual features and leads to improved performances.
  • Figure 2: Comparison of the characteristics of the MultiSports and Footovision datasets
  • Figure 3: Comparison of TAAD and our method TAAD + GNN: integrating structured game state information into the detection process leads to improvements in performances.
  • Figure 4: Comparison of the Precision - Recall curves for the TAAD and TAAD + GNN, with various IoU thresholds (0.2 and 0.5)
  • Figure 5: When working in high recall - low precision regime, our method significantly reduce the number of False Positive detections
  • ...and 1 more figures