Table of Contents
Fetching ...

Multi-Camera Asynchronous Ball Localization and Trajectory Prediction with Factor Graphs and Human Poses

Qingyu Xiao, Zulfiqar Zaidi, Matthew Gombolay

TL;DR

This work tackles the challenge of rapidly localizing and predicting tennis ball trajectories under Magnus effects using an asynchronous, multi-camera factor-graph framework. It fuses camera observations with physics-based factors (projection, motion, aerodynamics, and bounce) and incorporates spin priors computed from human poses via a Temporal Convolutional Network (TCN) to enhance early-state inference. The approach, implemented with GTSAM and ISAM2 for incremental inference, achieves a substantial RMSE reduction (up to 63.6%) in landing position predictions compared to baseline adaptive EKF methods, and reports a spin-prior RMSE of 5.27 Hz on validation. These results demonstrate the practical potential for real-time, pose-informed ball tracking in robotic tennis, enabling more reliable planning for ball-returning systems, while highlighting avenues for further improvement in spin estimation and bounce modeling.

Abstract

The rapid and precise localization and prediction of a ball are critical for developing agile robots in ball sports, particularly in sports like tennis characterized by high-speed ball movements and powerful spins. The Magnus effect induced by spin adds complexity to trajectory prediction during flight and bounce dynamics upon contact with the ground. In this study, we introduce an innovative approach that combines a multi-camera system with factor graphs for real-time and asynchronous 3D tennis ball localization. Additionally, we estimate hidden states like velocity and spin for trajectory prediction. Furthermore, to enhance spin inference early in the ball's flight, where limited observations are available, we integrate human pose data using a temporal convolutional network (TCN) to compute spin priors within the factor graph. This refinement provides more accurate spin priors at the beginning of the factor graph, leading to improved early-stage hidden state inference for prediction. Our result shows the trained TCN can predict the spin priors with RMSE of 5.27 Hz. Integrating TCN into the factor graph reduces the prediction error of landing positions by over 63.6% compared to a baseline method that utilized an adaptive extended Kalman filter.

Multi-Camera Asynchronous Ball Localization and Trajectory Prediction with Factor Graphs and Human Poses

TL;DR

This work tackles the challenge of rapidly localizing and predicting tennis ball trajectories under Magnus effects using an asynchronous, multi-camera factor-graph framework. It fuses camera observations with physics-based factors (projection, motion, aerodynamics, and bounce) and incorporates spin priors computed from human poses via a Temporal Convolutional Network (TCN) to enhance early-state inference. The approach, implemented with GTSAM and ISAM2 for incremental inference, achieves a substantial RMSE reduction (up to 63.6%) in landing position predictions compared to baseline adaptive EKF methods, and reports a spin-prior RMSE of 5.27 Hz on validation. These results demonstrate the practical potential for real-time, pose-informed ball tracking in robotic tennis, enabling more reliable planning for ball-returning systems, while highlighting avenues for further improvement in spin estimation and bounce modeling.

Abstract

The rapid and precise localization and prediction of a ball are critical for developing agile robots in ball sports, particularly in sports like tennis characterized by high-speed ball movements and powerful spins. The Magnus effect induced by spin adds complexity to trajectory prediction during flight and bounce dynamics upon contact with the ground. In this study, we introduce an innovative approach that combines a multi-camera system with factor graphs for real-time and asynchronous 3D tennis ball localization. Additionally, we estimate hidden states like velocity and spin for trajectory prediction. Furthermore, to enhance spin inference early in the ball's flight, where limited observations are available, we integrate human pose data using a temporal convolutional network (TCN) to compute spin priors within the factor graph. This refinement provides more accurate spin priors at the beginning of the factor graph, leading to improved early-stage hidden state inference for prediction. Our result shows the trained TCN can predict the spin priors with RMSE of 5.27 Hz. Integrating TCN into the factor graph reduces the prediction error of landing positions by over 63.6% compared to a baseline method that utilized an adaptive extended Kalman filter.
Paper Structure (22 sections, 16 equations, 7 figures)

This paper contains 22 sections, 16 equations, 7 figures.

Figures (7)

  • Figure 1: Leveraging time-series human pose data for ball spin estimation and fusing asynchronous camera detections through factor graphs substantially improves performance of trajectory prediction.
  • Figure 2: Example of the factor graph for ball localization at the $t+1$ time stamp. All the factors are colored squares and all the variables are labeled with circles, where $X_i$ is the $i^{th}$ camera pose, $L_t$ is the 3D location of the tennis ball at time step $t$, $V_t$ is the 3D velocity of the tennis ball at time step $t$, $W_j$ is the 3D spin of the tennis ball before the $j^{th}$ bounce. If detection is available in the queue, the factor graph will first expand
  • Figure 3: TCN for spin prior regression with human pose sequences. The model takes time-series data of human poses when hitting the ball as input and outputs an estimated spin value for the ball.
  • Figure 4: Camera setup on an indoor tennis court.
  • Figure 5: Scatterplot of labeled vs. estimated spin priors in the validation dataset, showing a strong correlation.
  • ...and 2 more figures