Uplifting Table Tennis: A Robust, Real-World Application for 3D Trajectory and Spin Estimation
Daniel Kienzle, Katja Ludwig, Julian Lorenz, Shin'ichi Satoh, Rainer Lienhart
TL;DR
This work tackles the problem of reconstructing accurate 3D table tennis ball trajectories and spin from monocular video, where real-world noise and lack of 3D ground truth hinder prior approaches. It introduces a robust two-stage pipeline: a front-end for 2D ball and table keypoint detection trained on the new TTHQ dataset, and a back-end 2D-to-3D uplifting model trained solely on synthetic data, augmented to handle missing detections and varying frame rates via time-aware RoPE embeddings. The key contributions include high-resolution detectors based on Segformer++ for both ball and table geometry, a transformer-based back-end that generalizes to real data, and a comprehensive dataset (TTHQ) with 2D annotations and spin labels; together they enable an end-to-end tool for 3D trajectory and spin analysis in real-world broadcast footage. The results show strong 2D detection performance, robust 3D uplift under real-world imperfections, and high spin classification accuracy, making the pipeline practical for sports analytics and extensible to other 3D reconstruction tasks with limited ground truth.
Abstract
Obtaining the precise 3D motion of a table tennis ball from standard monocular videos is a challenging problem, as existing methods trained on synthetic data struggle to generalize to the noisy, imperfect ball and table detections of the real world. This is primarily due to the inherent lack of 3D ground truth trajectories and spin annotations for real-world video. To overcome this, we propose a novel two-stage pipeline that divides the problem into a front-end perception task and a back-end 2D-to-3D uplifting task. This separation allows us to train the front-end components with abundant 2D supervision from our newly created TTHQ dataset, while the back-end uplifting network is trained exclusively on physically-correct synthetic data. We specifically re-engineer the uplifting model to be robust to common real-world artifacts, such as missing detections and varying frame rates. By integrating a ball detector and a table keypoint detector, our approach transforms a proof-of-concept uplifting method into a practical, robust, and high-performing end-to-end application for 3D table tennis trajectory and spin analysis.
