Table of Contents
Fetching ...

Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player's Trajectory

Ali K. AlShami, Terrance Boult, Jugal Kalita

TL;DR

A novel method called Pose2Trajectory is proposed, which predicts a tennis player's future trajectory as a sequence derived from their body joints' data and ball position as a sequence derived from their body joints' data and ball position.

Abstract

Tracking the trajectory of tennis players can help camera operators in production. Predicting future movement enables cameras to automatically track and predict a player's future trajectory without human intervention. Predicting future human movement in the context of complex physical tasks is also intellectually satisfying. Swift advancements in sports analytics and the wide availability of videos for tennis have inspired us to propose a novel method called Pose2Trajectory, which predicts a tennis player's future trajectory as a sequence derived from their body joints' data and ball position. Demonstrating impressive accuracy, our approach capitalizes on body joint information to provide a comprehensive understanding of the human body's geometry and motion, thereby enhancing the prediction of the player's trajectory. We use encoder-decoder Transformer architecture trained on the joints and trajectory information of the players with ball positions. The predicted sequence can provide information to help close-up cameras to keep tracking the tennis player, following centroid coordinates. We generate a high-quality dataset from multiple videos to assist tennis player movement prediction using object detection and human pose estimation methods. It contains bounding boxes and joint information for tennis players and ball positions in singles tennis games. Our method shows promising results in predicting the tennis player's movement trajectory with different sequence prediction lengths using the joints and trajectory information with the ball position.

Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player's Trajectory

TL;DR

A novel method called Pose2Trajectory is proposed, which predicts a tennis player's future trajectory as a sequence derived from their body joints' data and ball position as a sequence derived from their body joints' data and ball position.

Abstract

Tracking the trajectory of tennis players can help camera operators in production. Predicting future movement enables cameras to automatically track and predict a player's future trajectory without human intervention. Predicting future human movement in the context of complex physical tasks is also intellectually satisfying. Swift advancements in sports analytics and the wide availability of videos for tennis have inspired us to propose a novel method called Pose2Trajectory, which predicts a tennis player's future trajectory as a sequence derived from their body joints' data and ball position. Demonstrating impressive accuracy, our approach capitalizes on body joint information to provide a comprehensive understanding of the human body's geometry and motion, thereby enhancing the prediction of the player's trajectory. We use encoder-decoder Transformer architecture trained on the joints and trajectory information of the players with ball positions. The predicted sequence can provide information to help close-up cameras to keep tracking the tennis player, following centroid coordinates. We generate a high-quality dataset from multiple videos to assist tennis player movement prediction using object detection and human pose estimation methods. It contains bounding boxes and joint information for tennis players and ball positions in singles tennis games. Our method shows promising results in predicting the tennis player's movement trajectory with different sequence prediction lengths using the joints and trajectory information with the ball position.

Paper Structure

This paper contains 11 sections, 2 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: The figure shows the prediction of the future movement trajectory of a tennis player fifteen (250 ms), thirty (500 ms), and sixty (1s) frames ahead (until reaching the ball). The trajectory is predicted as the second player's centroid around which we show a $224 \times 224$ bounding box in future frames. The player appears lighter in future frames. The original image has been taken from videos on the https://www.youtube.com/@tennistv YouTube channel.
  • Figure 2: Some of the tennis players' movements that Faster-RCNN could not detect. This is likely because Faster RCNN was pre-trained on non-sports images.
  • Figure 3: Detecting the tennis player bounding boxes, ball position, and joints using Faster RCNN, ViTPose, and TrackNet. The image has been taken from videos on the https://www.youtube.com/@tennistv YouTube channel.
  • Figure 4: Our encoder-decoder Transformer model to predict a tennis player's trajectory. The encoder encodes players' centroids, joint positions, and ball positions with time information. The decoder takes part of the encoder information along with time information and predicts future centroids of the player at several future time points. $S_i$ represents a feature set of 74 values at time $i$, including player centroids, player joint positions, and ball positions. $S^{'}_i$ represents a set of two values indicating $X$ and $Y$ positions of player centroid that we want to predict at time $i$.
  • Figure 5: The prediction results for $X$ and $Y$ coordinates with and without using the LSTM in our architecture.
  • ...and 1 more figures