Achieving Human Level Competitive Robot Table Tennis

David B. D'Ambrosio; Saminda Abeyruwan; Laura Graesser; Atil Iscen; Heni Ben Amor; Alex Bewley; Barney J. Reed; Krista Reymann; Leila Takayama; Yuval Tassa; Krzysztof Choromanski; Erwin Coumans; Deepali Jain; Navdeep Jaitly; Natasha Jaques; Satoshi Kataoka; Yuheng Kuang; Nevena Lazic; Reza Mahjourian; Sherry Moore; Kenneth Oslund; Anish Shankar; Vikas Sindhwani; Vincent Vanhoucke; Grace Vesom; Peng Xu; Pannag R. Sanketi

Achieving Human Level Competitive Robot Table Tennis

David B. D'Ambrosio, Saminda Abeyruwan, Laura Graesser, Atil Iscen, Heni Ben Amor, Alex Bewley, Barney J. Reed, Krista Reymann, Leila Takayama, Yuval Tassa, Krzysztof Choromanski, Erwin Coumans, Deepali Jain, Navdeep Jaitly, Natasha Jaques, Satoshi Kataoka, Yuheng Kuang, Nevena Lazic, Reza Mahjourian, Sherry Moore, Kenneth Oslund, Anish Shankar, Vikas Sindhwani, Vincent Vanhoucke, Grace Vesom, Peng Xu, Pannag R. Sanketi

TL;DR

The paper presents the first learned robot agent capable of amateur human-level performance in competitive table tennis by combining a hierarchical policy with a modular library of low-level skills and a high-level controller that selects among them. It demonstrates zero-shot sim-to-real transfer through iterative grounding of real-world data into the training task distribution, augmented with dynamic/online opponent modeling (H-values) and a spin-aware perception stack. The approach yields robust real-time adaptation to unseen human opponents across a range of skill levels, validated by a user study of 29 participants and 29 matches, showing solid engagement and measurable performance gaps at higher skill levels. The work highlights the importance of system design, interpretable skill descriptors, and online adaptation in bridging the sim-to-real gap for complex, interactive tasks, and suggests a path toward broader applicability of hierarchical, data-efficient robot learning in real-world human-robot interaction contexts.

Abstract

Achieving human-level speed and performance on real world tasks is a north star for the robotics research community. This work takes a step towards that goal and presents the first learned robot agent that reaches amateur human-level performance in competitive table tennis. Table tennis is a physically demanding sport which requires human players to undergo years of training to achieve an advanced level of proficiency. In this paper, we contribute (1) a hierarchical and modular policy architecture consisting of (i) low level controllers with their detailed skill descriptors which model the agent's capabilities and help to bridge the sim-to-real gap and (ii) a high level controller that chooses the low level skills, (2) techniques for enabling zero-shot sim-to-real including an iterative approach to defining the task distribution that is grounded in the real-world and defines an automatic curriculum, and (3) real time adaptation to unseen opponents. Policy performance was assessed through 29 robot vs. human matches of which the robot won 45% (13/29). All humans were unseen players and their skill level varied from beginner to tournament level. Whilst the robot lost all matches vs. the most advanced players it won 100% matches vs. beginners and 55% matches vs. intermediate players, demonstrating solidly amateur human-level performance. Videos of the matches can be viewed at https://sites.google.com/view/competitive-robot-table-tennis

Achieving Human Level Competitive Robot Table Tennis

TL;DR

Abstract

Paper Structure (44 sections, 1 equation, 16 figures, 14 tables, 2 algorithms)

This paper contains 44 sections, 1 equation, 16 figures, 14 tables, 2 algorithms.

INTRODUCTION
METHOD
Hardware, problem setting, and environment
Hierarchical agent architecture and training overview
LLC training
The High Level Controller (HLC)
Event-driven decisions
Style policy
Spin classifier
LLC skill descriptors
Strategies and LLC shortlist
LLC preferences (H-value) & choosing an LLC
Techniques for enabling zero-shot sim-to-real
Modeling ball and robot dynamics
Spin "correction" and sim-to-sim adapter layers
...and 29 more sections

Figures (16)

Figure 1: Our table tennis robot playing against a professional coach. The green dots show the trajectory of the ball during the rally. The table tennis robot is a 6 DoF ABB 1100 arm mounted on top of two Festo linear gantries, enabling motion in the 2d plane. The x gantry, which moves side to side across the table, is 4m long and the y gantry, which moves towards and away from the table, is 2m long. A 3d printed paddle handle and paddle with short pips rubber is attached to the arm.
Figure 2: Method overview. We train a skill library of low-level controllers (LLCs), including serving and rallying, and sim-to-sim adapters from a dataset of ball states. Using the same ball states, we train a high level controller (HLC) for style selection. The policies are trained purely in simulation (but using real ball states) using Blackbox Gradient Sensing (BGS) D_Ambrosio_2023, abeyruwan2022sim2real. The policies transfer zero-shot to the physical world. At deployment time, we freeze the style selector and skills. During inference HLC uses the style selector to select the side. The heuristics module shortlists the most effective skills. H-values (online opponent model) select the most preferred skill, and the skill executes the actions.
Figure 3: LLC training lineage. LLC x = ID of the LLC in the final system. The forehand (FH) and backhand (BH) LLCs were each developed from two independently trained generalists. One of the generalists was developed along with the dataset cycles, whilst the other was trained only after finalizing the dataset. Both the seed forehand generalists were deployed (LLC 0 and LLC 2) whilst for the backhand only one of the seed generalists was deployed (LLC 9).
Figure 4: Once per ball hit, the HLC decides which LLC to return the ball with by first applying a style policy to the current ball state to determine forehand or backhand (in this example we demonstrate choosing forehand). If the ball is a serve it will attempt to classify the spin as topspin or underspin and pick the corresponding LLC. Otherwise it must determine which of the many rallying LLCs will perform best by finding the most similar ball state within the corresponding set of LLC skill tables and getting the return statistics. Heuristic strategies are applied to these statistics and produce a shortlist of candidate LLCs and the final LLC is chosen through a weighted selection. The chosen LLC will be queried at 50Hz with the current ball state to determine the robot actions.
Figure 5: Sample training in simulation and zero-shot transfer to the hardware are shown side by side.
...and 11 more figures

Achieving Human Level Competitive Robot Table Tennis

TL;DR

Abstract

Achieving Human Level Competitive Robot Table Tennis

Authors

TL;DR

Abstract

Table of Contents

Figures (16)