Table of Contents
Fetching ...

Humanoid Whole-Body Badminton via Multi-Stage Reinforcement Learning

Chenhao Liu, Leyun Jiang, Yibo Wang, Kairan Yao, Jinchen Fu, Xiaoyu Ren

TL;DR

This work tackles dynamic, contact-rich interactions by presenting a unified whole-body controller for humanoid badminton trained via a three-stage reinforcement-learning curriculum, without relying on motion priors or demonstrations. A model-based approach generates shuttle trajectories for training, while deployment uses an EKF-based trajectory predictor (with a prediction-free variant explored) to target precise interceptions. The system demonstrates zero-shot sim-to-real transfer, attaining up to 19.1 m/s shuttle speeds in real hardware and sustaining a 21-hit rally in simulation, with real-world metrics showing sub-100 mm hitting accuracy and sub-0.2 rad orientation error. The results indicate a viable path toward agile, dynamic humanoid interactions in racket sports and related dynamic domains, with clear avenues for extending to broader motor skills and higher-level decision making.

Abstract

Humanoid robots have demonstrated strong capabilities for interacting with static scenes across locomotion, manipulation, and more challenging loco-manipulation tasks. Yet the real world is dynamic, and quasi-static interactions are insufficient to cope with diverse environmental conditions. As a step toward more dynamic interaction scenarios, we present a reinforcement-learning-based training pipeline that produces a unified whole-body controller for humanoid badminton, enabling coordinated lower-body footwork and upper-body striking without motion priors or expert demonstrations. Training follows a three-stage curriculum: first footwork acquisition, then precision-guided racket swing generation, and finally task-focused refinement, yielding motions in which both legs and arms serve the hitting objective. For deployment, we incorporate an Extended Kalman Filter (EKF) to estimate and predict shuttlecock trajectories for target striking. We also introduce a prediction-free variant that dispenses with EKF and explicit trajectory prediction. To validate the framework, we conduct five sets of experiments in both simulation and the real world. In simulation, two robots sustain a rally of 21 consecutive hits. Moreover, the prediction-free variant achieves successful hits with comparable performance relative to the target-known policy. In real-world tests, both prediction and controller modules exhibit high accuracy, and on-court hitting achieves an outgoing shuttle speed up to 19.1 m/s with a mean return landing distance of 4 m. These experimental results show that our proposed training scheme can deliver highly dynamic while precise goal striking in badminton, and can be adapted to more dynamics-critical domains.

Humanoid Whole-Body Badminton via Multi-Stage Reinforcement Learning

TL;DR

This work tackles dynamic, contact-rich interactions by presenting a unified whole-body controller for humanoid badminton trained via a three-stage reinforcement-learning curriculum, without relying on motion priors or demonstrations. A model-based approach generates shuttle trajectories for training, while deployment uses an EKF-based trajectory predictor (with a prediction-free variant explored) to target precise interceptions. The system demonstrates zero-shot sim-to-real transfer, attaining up to 19.1 m/s shuttle speeds in real hardware and sustaining a 21-hit rally in simulation, with real-world metrics showing sub-100 mm hitting accuracy and sub-0.2 rad orientation error. The results indicate a viable path toward agile, dynamic humanoid interactions in racket sports and related dynamic domains, with clear avenues for extending to broader motor skills and higher-level decision making.

Abstract

Humanoid robots have demonstrated strong capabilities for interacting with static scenes across locomotion, manipulation, and more challenging loco-manipulation tasks. Yet the real world is dynamic, and quasi-static interactions are insufficient to cope with diverse environmental conditions. As a step toward more dynamic interaction scenarios, we present a reinforcement-learning-based training pipeline that produces a unified whole-body controller for humanoid badminton, enabling coordinated lower-body footwork and upper-body striking without motion priors or expert demonstrations. Training follows a three-stage curriculum: first footwork acquisition, then precision-guided racket swing generation, and finally task-focused refinement, yielding motions in which both legs and arms serve the hitting objective. For deployment, we incorporate an Extended Kalman Filter (EKF) to estimate and predict shuttlecock trajectories for target striking. We also introduce a prediction-free variant that dispenses with EKF and explicit trajectory prediction. To validate the framework, we conduct five sets of experiments in both simulation and the real world. In simulation, two robots sustain a rally of 21 consecutive hits. Moreover, the prediction-free variant achieves successful hits with comparable performance relative to the target-known policy. In real-world tests, both prediction and controller modules exhibit high accuracy, and on-court hitting achieves an outgoing shuttle speed up to 19.1 m/s with a mean return landing distance of 4 m. These experimental results show that our proposed training scheme can deliver highly dynamic while precise goal striking in badminton, and can be adapted to more dynamics-critical domains.

Paper Structure

This paper contains 33 sections, 7 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Real-world humanoid badminton. A fully autonomous humanoid returns machine-fed shuttles in a motion-capture arena; overlaid arcs show an incoming (blue) and returned (orange) trajectory. Project Page: https://humanoid-badminton.github.io/Humanoid-Whole-Body-Badminton-via-Multi-Stage-Reinforcement-Learning/.
  • Figure 2: System overview.(a) Training: PPO learns a single policy $\pi_{\mathrm{WBC}}$ using Privileged Critic Obs together with Actor Obs (no history) under a three-stage curriculum. All observations and rewards in (a) come from the simulation environment. (b) Environment: The humanoid is 1.28 m tall, weighs 30 kg, and has 21 DoF. A 3D-printed mount attaches the racket orthogonally to the forearm. The robot is initialized above the origin and faces the $+x$ direction. Shuttlecock position and base pose are obtained by motion capture using the marker. (c) Deploy: At runtime, Mocap directly provides the base state, while an EKF with a path predictor estimates the shuttle trajectory to produce the hitting target $\{p^*_{\mathrm{ee}}, q^*_{\mathrm{ee}}, t^*\}$. This information, together with proprioception and dual-history features, forms the complete Actor Obs. $\pi_{\mathrm{WBC}}$ consumes the Actor Obs and outputs whole-body actions executed by a low-level PD controller.
  • Figure 3: Simulation results. Figure (a) illustrates the Two-Robot Rally scenario, where two identical humanoid robots sustain a rally of 21 consecutive returns. Figure (b) demonstrates the Prediction-Free policy: the robot infers the optimal impact position and orientation solely from the first five recorded shuttlecock positions after serving. Figure (c) presents the Target-Known policy, where a predetermined hitting position is provided. The red sphere indicates the designated hitting location, while the green sphere confirms successful impact execution by the robot.
  • Figure 4: Comparison between target-known and prediction-free policy. The top part of this figure shows the position error for both strategies. The middle section of the figure shows the orientation error comparison, the orientation corresponds to the normal direction of the racket face. The bottom part of the figure compares swing velocity.
  • Figure 5: EKF Prediction Accuracy. The predicted striking position error (top) and striking time error (bottom) were evaluated over 20 shuttlecock trajectories. The shaded regions represent the standard deviation. At 0.6 s before interception, the mean position error was less than 100 mm, already smaller than the radius of the racket.
  • ...and 9 more figures