Table of Contents
Fetching ...

Strategy and Skill Learning for Physics-based Table Tennis Animation

Jiashun Wang, Jessica Hodgins, Jungdam Won

TL;DR

The paper tackles the challenge of producing physics-based table tennis agents capable of diverse, natural motions and adaptive strategies by introducing a hierarchical framework that separates skill execution from strategic decision making. A three-stage skill-level controller (imitation, ball control, mixer) eliminates mode collapse, while a strategy-level CVAE-based behavior-cloning approach enables explicit skill and target selection during competition or cooperation. Across agent-agent and human-agent VR experiments, the method improves motion quality, skill diversity, task performance, and interactive realism, yielding higher win rates and longer rallies than baselines. This work advances bidirectional human–agent interaction in physically simulated sports and provides a scalable platform for future research in learned skills and strategic play.

Abstract

Recent advancements in physics-based character animation leverage deep learning to generate agile and natural motion, enabling characters to execute movements such as backflips, boxing, and tennis. However, reproducing the selection and use of diverse motor skills in dynamic environments to solve complex tasks, as humans do, still remains a challenge. We present a strategy and skill learning approach for physics-based table tennis animation. Our method addresses the issue of mode collapse, where the characters do not fully utilize the motor skills they need to perform to execute complex tasks. More specifically, we demonstrate a hierarchical control system for diversified skill learning and a strategy learning framework for effective decision-making. We showcase the efficacy of our method through comparative analysis with state-of-the-art methods, demonstrating its capabilities in executing various skills for table tennis. Our strategy learning framework is validated through both agent-agent interaction and human-agent interaction in Virtual Reality, handling both competitive and cooperative tasks.

Strategy and Skill Learning for Physics-based Table Tennis Animation

TL;DR

The paper tackles the challenge of producing physics-based table tennis agents capable of diverse, natural motions and adaptive strategies by introducing a hierarchical framework that separates skill execution from strategic decision making. A three-stage skill-level controller (imitation, ball control, mixer) eliminates mode collapse, while a strategy-level CVAE-based behavior-cloning approach enables explicit skill and target selection during competition or cooperation. Across agent-agent and human-agent VR experiments, the method improves motion quality, skill diversity, task performance, and interactive realism, yielding higher win rates and longer rallies than baselines. This work advances bidirectional human–agent interaction in physically simulated sports and provides a scalable platform for future research in learned skills and strategic play.

Abstract

Recent advancements in physics-based character animation leverage deep learning to generate agile and natural motion, enabling characters to execute movements such as backflips, boxing, and tennis. However, reproducing the selection and use of diverse motor skills in dynamic environments to solve complex tasks, as humans do, still remains a challenge. We present a strategy and skill learning approach for physics-based table tennis animation. Our method addresses the issue of mode collapse, where the characters do not fully utilize the motor skills they need to perform to execute complex tasks. More specifically, we demonstrate a hierarchical control system for diversified skill learning and a strategy learning framework for effective decision-making. We showcase the efficacy of our method through comparative analysis with state-of-the-art methods, demonstrating its capabilities in executing various skills for table tennis. Our strategy learning framework is validated through both agent-agent interaction and human-agent interaction in Virtual Reality, handling both competitive and cooperative tasks.
Paper Structure (20 sections, 11 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 20 sections, 11 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: An overview of our method. Strategy action includes the skill command and ball's target landing location. Skill action includes the target joint angles for PD controllers, blended from the outputs of imitation policies.
  • Figure 2: The architecture of our method. We train the skill-level controller through the stages of imitation policies, ball control policies, and finally, the mixer policy. We train the strategy-level controller after the skill-level controller is ready and its weight is frozen. $\otimes \oplus$ stands for the weighted sum in Equation \ref{['eq:mix']}.
  • Figure 3: Comparison with other methods with four skill commands. ASE and CASE may use wrong skills as shown in the red box. ET may terminate earlier to return to a preparation pose, as shown in the yellow boxes.
  • Figure 4: Transition results with only using forehand and backhand drive controllers. Both controllers are trained with random initialized configurations from the motion capture data. As shown in the red boxes, the agent attempts to use another forehand drive before the next ball is launched, which prevents it from switching back to a backhand drive in time.
  • Figure 5: Skill command distribution of our method and RL.
  • ...and 4 more figures