Table of Contents
Fetching ...

Integrating Learning-Based Manipulation and Physics-Based Locomotion for Whole-Body Badminton Robot Control

Haochen Wang, Zhiwei Shi, Chengxi Zhu, Yafei Qiao, Cheng Zhang, Fan Yang, Pengjie Ren, Lan Lu, Dong Xuan

TL;DR

Hamlet presents a novel hybrid control framework for agile badminton robots that combines model-based chassis locomotion with learning-based arm manipulation. A physics-informed IL+RL training pipeline uses a privileged model to guide imitation learning and subsequent reinforcement learning, with a critic warmed up during IL to prevent performance drops. Real-world experiments show high success against a serving machine (94.5%) and humans (90.7%), plus zero-shot transfer across different chassis without arm retraining. This approach addresses sim-to-real gaps, enables safe exploration, and generalizes to other agile mobile manipulation tasks such as high-speed catching.

Abstract

Learning-based methods, such as imitation learning (IL) and reinforcement learning (RL), can produce excel control policies over challenging agile robot tasks, such as sports robot. However, no existing work has harmonized learning-based policy with model-based methods to reduce training complexity and ensure the safety and stability for agile badminton robot control. In this paper, we introduce Hamlet, a novel hybrid control system for agile badminton robots. Specifically, we propose a model-based strategy for chassis locomotion which provides a base for arm policy. We introduce a physics-informed "IL+RL" training framework for learning-based arm policy. In this train framework, a model-based strategy with privileged information is used to guide arm policy training during both IL and RL phases. In addition, we train the critic model during IL phase to alleviate the performance drop issue when transitioning from IL to RL. We present results on our self-engineered badminton robot, achieving 94.5% success rate against the serving machine and 90.7% success rate against human players. Our system can be easily generalized to other agile mobile manipulation tasks such as agile catching and table tennis. Our project website: https://dreamstarring.github.io/HAMLET/.

Integrating Learning-Based Manipulation and Physics-Based Locomotion for Whole-Body Badminton Robot Control

TL;DR

Hamlet presents a novel hybrid control framework for agile badminton robots that combines model-based chassis locomotion with learning-based arm manipulation. A physics-informed IL+RL training pipeline uses a privileged model to guide imitation learning and subsequent reinforcement learning, with a critic warmed up during IL to prevent performance drops. Real-world experiments show high success against a serving machine (94.5%) and humans (90.7%), plus zero-shot transfer across different chassis without arm retraining. This approach addresses sim-to-real gaps, enables safe exploration, and generalizes to other agile mobile manipulation tasks such as high-speed catching.

Abstract

Learning-based methods, such as imitation learning (IL) and reinforcement learning (RL), can produce excel control policies over challenging agile robot tasks, such as sports robot. However, no existing work has harmonized learning-based policy with model-based methods to reduce training complexity and ensure the safety and stability for agile badminton robot control. In this paper, we introduce Hamlet, a novel hybrid control system for agile badminton robots. Specifically, we propose a model-based strategy for chassis locomotion which provides a base for arm policy. We introduce a physics-informed "IL+RL" training framework for learning-based arm policy. In this train framework, a model-based strategy with privileged information is used to guide arm policy training during both IL and RL phases. In addition, we train the critic model during IL phase to alleviate the performance drop issue when transitioning from IL to RL. We present results on our self-engineered badminton robot, achieving 94.5% success rate against the serving machine and 90.7% success rate against human players. Our system can be easily generalized to other agile mobile manipulation tasks such as agile catching and table tennis. Our project website: https://dreamstarring.github.io/HAMLET/.

Paper Structure

This paper contains 24 sections, 8 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Agile badminton robot system. The badminton robot on the left, which consists of an omnidirectional chassis and a 5-DOF arm, is playing badminton against a human on the right. The robot can move flexibly in the court and rapidly swing the racket to return the ball.
  • Figure 2: Overview of the components of Hamlet.The green boxes represent the processing steps, including the detection and prediction of the ball trajectory. The blue boxes represent the model-based strategy for controlling the chassis, as discussed in \ref{['chassis control policy']}. The dark orange box represents the rigid transformation of ball trajectory coordinates, as discussed in \ref{['itegrating method']}. The yellow box represents the learning-based policy for controlling the robotic arm, as discussed in \ref{['robotic arm control policy']}.
  • Figure 3: Hardware of badminton robot system. Our badminton robot system consists of two parts: (a) vision module and (b) robot body.The visual module handles scene perception, including ball recognition, tracking, and robot positioning. The computer on the robot body executes the control algorithm and send control commands to each motor.
  • Figure 4: Serving machine experiments setup. We set 6 positions for the serving machine on the right side of the court. We configure 20 combinations of machine positions, power, and angle to ensure diverse trajectories. The circles on the left side of the court show the ball landing points. The subplots on the left and right show different views of the real court.
  • Figure 5: Comparison of the control gap between simulation and real world of chassis and arm. We test the difference between the mobile platform and the robot's end position in the simulation and real world under the same control commands. Box plot (a) visualizes the end distance error. We show the standard deviation of the end distance error for the chassis and arm in chart (b). The results show that the chassis is more volatile to changes in the environment and is less conducive to sim2real.
  • ...and 2 more figures