Table of Contents
Fetching ...

AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements

Calvin Yeung, Kenjiro Ide, Keisuke Fujii

TL;DR

AutoSoccerPose tackles the shortage of soccer-specific pose-sequence data and the limitations of linear posture models by introducing the 3DSP dataset and a non-linear graph-recurrent approach (3DSP-GRAE) for posture analysis. The pipeline semi-automates 2D and 3D pose estimation from broadcast soccer videos, leveraging RTMPose for 2D poses and MotionAGFormer for 3D lifting, and encodes spatiotemporal posture dynamics with a Graph ConvNet + LSTM autoencoder. Empirical results across SoccerNet and 3DSP demonstrate strong tracking performance, effective shooter tracklet selection, solid 2D pose estimation in zero-shot settings, and meaningful clustering of shooting styles in 3D posture space, highlighting practical pathways for automated posture analysis in soccer and beyond.

Abstract

Image understanding is a foundational task in computer vision, with recent applications emerging in soccer posture analysis. However, existing publicly available datasets lack comprehensive information, notably in the form of posture sequences and 2D pose annotations. Moreover, current analysis models often rely on interpretable linear models (e.g., PCA and regression), limiting their capacity to capture non-linear spatiotemporal relationships in complex and diverse scenarios. To address these gaps, we introduce the 3D Shot Posture (3DSP) dataset in soccer broadcast videos, which represents the most extensive sports image dataset with 2D pose annotations to our knowledge. Additionally, we present the 3DSP-GRAE (Graph Recurrent AutoEncoder) model, a non-linear approach for embedding pose sequences. Furthermore, we propose AutoSoccerPose, a pipeline aimed at semi-automating 2D and 3D pose estimation and posture analysis. While achieving full automation proved challenging, we provide a foundational baseline, extending its utility beyond the scope of annotated data. We validate AutoSoccerPose on SoccerNet and 3DSP datasets, and present posture analysis results based on 3DSP. The dataset, code, and models are available at: https://github.com/calvinyeungck/3D-Shot-Posture-Dataset.

AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements

TL;DR

AutoSoccerPose tackles the shortage of soccer-specific pose-sequence data and the limitations of linear posture models by introducing the 3DSP dataset and a non-linear graph-recurrent approach (3DSP-GRAE) for posture analysis. The pipeline semi-automates 2D and 3D pose estimation from broadcast soccer videos, leveraging RTMPose for 2D poses and MotionAGFormer for 3D lifting, and encodes spatiotemporal posture dynamics with a Graph ConvNet + LSTM autoencoder. Empirical results across SoccerNet and 3DSP demonstrate strong tracking performance, effective shooter tracklet selection, solid 2D pose estimation in zero-shot settings, and meaningful clustering of shooting styles in 3D posture space, highlighting practical pathways for automated posture analysis in soccer and beyond.

Abstract

Image understanding is a foundational task in computer vision, with recent applications emerging in soccer posture analysis. However, existing publicly available datasets lack comprehensive information, notably in the form of posture sequences and 2D pose annotations. Moreover, current analysis models often rely on interpretable linear models (e.g., PCA and regression), limiting their capacity to capture non-linear spatiotemporal relationships in complex and diverse scenarios. To address these gaps, we introduce the 3D Shot Posture (3DSP) dataset in soccer broadcast videos, which represents the most extensive sports image dataset with 2D pose annotations to our knowledge. Additionally, we present the 3DSP-GRAE (Graph Recurrent AutoEncoder) model, a non-linear approach for embedding pose sequences. Furthermore, we propose AutoSoccerPose, a pipeline aimed at semi-automating 2D and 3D pose estimation and posture analysis. While achieving full automation proved challenging, we provide a foundational baseline, extending its utility beyond the scope of annotated data. We validate AutoSoccerPose on SoccerNet and 3DSP datasets, and present posture analysis results based on 3DSP. The dataset, code, and models are available at: https://github.com/calvinyeungck/3D-Shot-Posture-Dataset.
Paper Structure (19 sections, 5 figures, 5 tables)

This paper contains 19 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Overview of AutoSoccerPose.
  • Figure 2: Overview of the 3DSP-GRAE model. The variables $X$, $H$, $C$, and $t$ in the LSTM layer denote the input features, hidden state, cell state, and time step, respectively.
  • Figure 3: AutoSoccerPose qualitative results. Each row depicts the AutoSoccerPose 2D and 3D pose estimation from a broadcast video, and the columns denote which frame of the video. The top photo represents the broadcast video frame for each cell, while the bottom-left and bottom-right images correspond to the 2D and 3D pose estimations, respectively.
  • Figure 4: K-means clustering visualization.
  • Figure 5: Shot movement closest to the cluster centroid. The row and column denote the cluster and frame, respectively.