Table of Contents
Fetching ...

HITTER: A HumanoId Table TEnnis Robot via Hierarchical Planning and Learning

Zhi Su, Bike Zhang, Nima Rahmanian, Yuman Gao, Qiayuan Liao, Caitlin Regan, Koushil Sreenath, S. Shankar Sastry

TL;DR

This work tackles the problem of enabling agile, real-time humanoid table tennis by coupling a model-based ball trajectory planner with a reinforcement-learning-based whole-body controller. The planner provides precise striking position, velocity, and timing, while the WBC executes natural, balanced motions trained with human swing references. Key contributions include a state-estimation and trajectory-prediction pipeline, a separation of base and racket commands for efficient learning, and an asymmetric actor-critic RL setup, validated through real-world rallies up to 106 shots. The results demonstrate sub-second reaction times, high hit/return rates, and autonomous multi-robot rallies, marking a substantive step toward interactive, human-like humanoid behavior in dynamic manipulation tasks.

Abstract

Humanoid robots have recently achieved impressive progress in locomotion and whole-body control, yet they remain constrained in tasks that demand rapid interaction with dynamic environments through manipulation. Table tennis exemplifies such a challenge: with ball speeds exceeding 5 m/s, players must perceive, predict, and act within sub-second reaction times, requiring both agility and precision. To address this, we present a hierarchical framework for humanoid table tennis that integrates a model-based planner for ball trajectory prediction and racket target planning with a reinforcement learning-based whole-body controller. The planner determines striking position, velocity and timing, while the controller generates coordinated arm and leg motions that mimic human strikes and maintain stability and agility across consecutive rallies. Moreover, to encourage natural movements, human motion references are incorporated during training. We validate our system on a general-purpose humanoid robot, achieving up to 106 consecutive shots with a human opponent and sustained exchanges against another humanoid. These results demonstrate real-world humanoid table tennis with sub-second reactive control, marking a step toward agile and interactive humanoid behaviors.

HITTER: A HumanoId Table TEnnis Robot via Hierarchical Planning and Learning

TL;DR

This work tackles the problem of enabling agile, real-time humanoid table tennis by coupling a model-based ball trajectory planner with a reinforcement-learning-based whole-body controller. The planner provides precise striking position, velocity, and timing, while the WBC executes natural, balanced motions trained with human swing references. Key contributions include a state-estimation and trajectory-prediction pipeline, a separation of base and racket commands for efficient learning, and an asymmetric actor-critic RL setup, validated through real-world rallies up to 106 shots. The results demonstrate sub-second reaction times, high hit/return rates, and autonomous multi-robot rallies, marking a substantive step toward interactive, human-like humanoid behavior in dynamic manipulation tasks.

Abstract

Humanoid robots have recently achieved impressive progress in locomotion and whole-body control, yet they remain constrained in tasks that demand rapid interaction with dynamic environments through manipulation. Table tennis exemplifies such a challenge: with ball speeds exceeding 5 m/s, players must perceive, predict, and act within sub-second reaction times, requiring both agility and precision. To address this, we present a hierarchical framework for humanoid table tennis that integrates a model-based planner for ball trajectory prediction and racket target planning with a reinforcement learning-based whole-body controller. The planner determines striking position, velocity and timing, while the controller generates coordinated arm and leg motions that mimic human strikes and maintain stability and agility across consecutive rallies. Moreover, to encourage natural movements, human motion references are incorporated during training. We validate our system on a general-purpose humanoid robot, achieving up to 106 consecutive shots with a human opponent and sustained exchanges against another humanoid. These results demonstrate real-world humanoid table tennis with sub-second reactive control, marking a step toward agile and interactive humanoid behaviors.

Paper Structure

This paper contains 24 sections, 6 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Humanoid table tennis rallies. Our system enables both humanoid-humanoid (left) and humanoid-human (right) matches, achieving rallies of up to 106 consecutive shots against a human opponent. Project website: https://humanoid-table-tennis.github.io.
  • Figure 2: System overview. (a) The racket is mounted on the robot’s right wrist using a 3D-printed connector, and the ball is covered with reflective tape for motion capture. (b) The motion capture system tracks the ball position $\mathbf{p}_{\mathrm{ball}}$, the robot base position $\mathbf{p}_{\mathrm{base}}$, and the base forward vector $\mathbf{e}_{\mathrm{base},x}$. (c) The model-based planner uses $\mathbf{p}_{\mathrm{ball}}$ and the desired landing point $\hat{\mathbf{p}}_l$ to predict the racket’s striking position $\hat{\mathbf{p}}_{\mathrm{racket}}$, velocity $\hat{\mathbf{v}}_{\mathrm{racket}}$, and strike time $t_{\mathrm{strike}}$. Given $\mathbf{p}_{\mathrm{base}}$ and $\hat{\mathbf{p}}_{\mathrm{racket}}$, the base target position $\hat{\mathbf{p}}_{\mathrm{base}}$ is also computed. (d) The learning-based Whole Body Controller (WBC) policy $\pi_{WBC}$ is trained in simulation via reinforcement learning with human motion references and then deployed on the real robot. It takes as input the observations provided by other system components together with the robot’s proprioceptive information.
  • Figure 3: Prediction errors of the model-based planner. Striking position error (top) and striking time error (bottom) are evaluated over 20 ball trajectories. The shaded regions indicate the standard deviation, and the red dashed line marks the critical position error of 7.5 cm, corresponding to the racket radius. At 0.5 s before the strike, the position error falls below this threshold.
  • Figure 4: Agility evaluation of the WBC policy. Based on 943 successful simulated trials (94.3% success rate), when the initial distance is within 0.75 m, the target can be reached in under 0.8 s on average, which is faster than the strike time of 0.86 s. Error bars denote standard deviation across trials.
  • Figure 5: Real-world rapid reaching motion. The whole-body control policy enables agile reaching motions, allowing the robot to swiftly transition from the right side of the table to the left while maintaining balance and successfully striking the ball.
  • ...and 2 more figures