Table of Contents
Fetching ...

Learning Dynamics of a Ball with Differentiable Factor Graph and Roto-Translational Invariant Representations

Qingyu Xiao, Zixuan Wu, Matthew Gombolay

TL;DR

This work tackles fast, spin-influenced ball trajectory prediction in dynamic settings by proposing an end-to-end framework that jointly optimizes a differentiable factor-graph estimator and a dynamics model. It introduces Gram-Schmidt-based roto-translational invariant representations and a self-multiplicative neural network to capture complex aerodynamics and bounce dynamics, enabling data-efficient learning. On a 717-trajectory ping-pong dataset with launcher-based spin labeling, the approach outperforms baselines in apex and full-trajectory RMSE, with the MNN+GS variant achieving the lowest apex RMSE. The proposed framework promises more reliable, real-time predictive capabilities for robotic partners operating in dynamic, spin-rich environments.

Abstract

Robots in dynamic environments need fast, accurate models of how objects move in their environments to support agile planning. In sports such as ping pong, analytical models often struggle to accurately predict ball trajectories with spins due to complex aerodynamics, elastic behaviors, and the challenges of modeling sliding and rolling friction. On the other hand, despite the promise of data-driven methods, machine learning struggles to make accurate, consistent predictions without precise input. In this paper, we propose an end-to-end learning framework that can jointly train a dynamics model and a factor graph estimator. Our approach leverages a Gram-Schmidt (GS) process to extract roto-translational invariant representations to improve the model performance, which can further reduce the validation error compared to data augmentation method. Additionally, we propose a network architecture that enhances nonlinearity by using self-multiplicative bypasses in the layer connections. By leveraging these novel methods, our proposed approach predicts the ball's position with an RMSE of 37.2 mm of the paddle radius at the apex after the first bounce, and 71.5 mm after the second bounce.

Learning Dynamics of a Ball with Differentiable Factor Graph and Roto-Translational Invariant Representations

TL;DR

This work tackles fast, spin-influenced ball trajectory prediction in dynamic settings by proposing an end-to-end framework that jointly optimizes a differentiable factor-graph estimator and a dynamics model. It introduces Gram-Schmidt-based roto-translational invariant representations and a self-multiplicative neural network to capture complex aerodynamics and bounce dynamics, enabling data-efficient learning. On a 717-trajectory ping-pong dataset with launcher-based spin labeling, the approach outperforms baselines in apex and full-trajectory RMSE, with the MNN+GS variant achieving the lowest apex RMSE. The proposed framework promises more reliable, real-time predictive capabilities for robotic partners operating in dynamic, spin-rich environments.

Abstract

Robots in dynamic environments need fast, accurate models of how objects move in their environments to support agile planning. In sports such as ping pong, analytical models often struggle to accurately predict ball trajectories with spins due to complex aerodynamics, elastic behaviors, and the challenges of modeling sliding and rolling friction. On the other hand, despite the promise of data-driven methods, machine learning struggles to make accurate, consistent predictions without precise input. In this paper, we propose an end-to-end learning framework that can jointly train a dynamics model and a factor graph estimator. Our approach leverages a Gram-Schmidt (GS) process to extract roto-translational invariant representations to improve the model performance, which can further reduce the validation error compared to data augmentation method. Additionally, we propose a network architecture that enhances nonlinearity by using self-multiplicative bypasses in the layer connections. By leveraging these novel methods, our proposed approach predicts the ball's position with an RMSE of 37.2 mm of the paddle radius at the apex after the first bounce, and 71.5 mm after the second bounce.
Paper Structure (23 sections, 8 equations, 5 figures, 1 table)

This paper contains 23 sections, 8 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Predicted ball trajectories conditioned on the launcher's spin settings. Spin values are represented as integers, where larger positive numbers or smaller negative numbers indicate higher spin. In the legend, 'TS' denotes topspin and 'SS' denotes sidespin.
  • Figure 2: The end-to-end dynamics learning framework and the proposed self-multiplicative neural netowrk (MNN) architecture for dynamics learning.
  • Figure 3: Experiment setups. (A) shows the view of one of the three calibrated cameras are deployed around the table. (B) visualizes the raw 3D points are computed by triangulation from paired cameras. (C) Depicts the spin indications from launcher settings.
  • Figure 4: RMSE of estimated velocity using different estimators. EKF have better estimation compared to sliding window method when having more observations. But EKF have significantly larger estimation errors than factor graph-based estimator (used in our learning framework) in magnitude and standard deviation.
  • Figure 5: Compare the RMSE learned from different dynamics models. "Aug." means the model uses data augmentation instead of Gram Schmidt (noted as "GS") process to extract roto-invariant representations. "A-Tune" is analytical model with parameters learned from real world data. "Skip" refers to MLP with skipped connections, "MNN" refers to MLP with self-multiplying bypass along with skipped connections.