Humanoid Whole-Body Badminton via Multi-Stage Reinforcement Learning
Chenhao Liu, Leyun Jiang, Yibo Wang, Kairan Yao, Jinchen Fu, Xiaoyu Ren
TL;DR
This work tackles dynamic, contact-rich interactions by presenting a unified whole-body controller for humanoid badminton trained via a three-stage reinforcement-learning curriculum, without relying on motion priors or demonstrations. A model-based approach generates shuttle trajectories for training, while deployment uses an EKF-based trajectory predictor (with a prediction-free variant explored) to target precise interceptions. The system demonstrates zero-shot sim-to-real transfer, attaining up to 19.1 m/s shuttle speeds in real hardware and sustaining a 21-hit rally in simulation, with real-world metrics showing sub-100 mm hitting accuracy and sub-0.2 rad orientation error. The results indicate a viable path toward agile, dynamic humanoid interactions in racket sports and related dynamic domains, with clear avenues for extending to broader motor skills and higher-level decision making.
Abstract
Humanoid robots have demonstrated strong capabilities for interacting with static scenes across locomotion, manipulation, and more challenging loco-manipulation tasks. Yet the real world is dynamic, and quasi-static interactions are insufficient to cope with diverse environmental conditions. As a step toward more dynamic interaction scenarios, we present a reinforcement-learning-based training pipeline that produces a unified whole-body controller for humanoid badminton, enabling coordinated lower-body footwork and upper-body striking without motion priors or expert demonstrations. Training follows a three-stage curriculum: first footwork acquisition, then precision-guided racket swing generation, and finally task-focused refinement, yielding motions in which both legs and arms serve the hitting objective. For deployment, we incorporate an Extended Kalman Filter (EKF) to estimate and predict shuttlecock trajectories for target striking. We also introduce a prediction-free variant that dispenses with EKF and explicit trajectory prediction. To validate the framework, we conduct five sets of experiments in both simulation and the real world. In simulation, two robots sustain a rally of 21 consecutive hits. Moreover, the prediction-free variant achieves successful hits with comparable performance relative to the target-known policy. In real-world tests, both prediction and controller modules exhibit high accuracy, and on-court hitting achieves an outgoing shuttle speed up to 19.1 m/s with a mean return landing distance of 4 m. These experimental results show that our proposed training scheme can deliver highly dynamic while precise goal striking in badminton, and can be adapted to more dynamics-critical domains.
