Table of Contents
Fetching ...

HybridMimic: Hybrid RL-Centroidal Control for Humanoid Motion Mimicking

Ludwig Chee-Ying Tay, I-Chia Chang, Yan Gu

TL;DR

HybridMimic, a framework in which a learned policy dynamically modulates a centroidal-model-based controller by predicting continuous contact states and desired centroidal velocities, is introduced, demonstrating the robustness of dynamics-aware deployment.

Abstract

Motion mimicking, i.e., encouraging the control policy to mimic human motion, facilitates the learning of complex tasks via reinforcement learning (RL) for humanoid robots. Although standard RL frameworks demonstrate impressive locomotion agility, they often bypass explicit reasoning about robot dynamics during deployment, which is a design choice that can lead to physically infeasible commands when the robot encounters out-of-distribution environments. By integrating model-based principles, hybrid approaches can improve performance; however, existing methods typically rely on predefined contact timing, limiting their versatility. This paper introduces HybridMimic, a framework in which a learned policy dynamically modulates a centroidal-model-based controller by predicting continuous contact states and desired centroidal velocities. This architecture exploits the physical grounding of centroidal dynamics to generate feedforward torques that remain feasible even under domain shift. Using physics-informed rewards, the policy is trained to efficiently utilize the centroidal controller's optimization by outputting precise control targets and reference torques. Through hardware experiments on the Booster T1 humanoid, HybridMimic reduces the average base position tracking error by 13\% compared to a state-of-the-art RL baseline, demonstrating the robustness of dynamics-aware deployment.

HybridMimic: Hybrid RL-Centroidal Control for Humanoid Motion Mimicking

TL;DR

HybridMimic, a framework in which a learned policy dynamically modulates a centroidal-model-based controller by predicting continuous contact states and desired centroidal velocities, is introduced, demonstrating the robustness of dynamics-aware deployment.

Abstract

Motion mimicking, i.e., encouraging the control policy to mimic human motion, facilitates the learning of complex tasks via reinforcement learning (RL) for humanoid robots. Although standard RL frameworks demonstrate impressive locomotion agility, they often bypass explicit reasoning about robot dynamics during deployment, which is a design choice that can lead to physically infeasible commands when the robot encounters out-of-distribution environments. By integrating model-based principles, hybrid approaches can improve performance; however, existing methods typically rely on predefined contact timing, limiting their versatility. This paper introduces HybridMimic, a framework in which a learned policy dynamically modulates a centroidal-model-based controller by predicting continuous contact states and desired centroidal velocities. This architecture exploits the physical grounding of centroidal dynamics to generate feedforward torques that remain feasible even under domain shift. Using physics-informed rewards, the policy is trained to efficiently utilize the centroidal controller's optimization by outputting precise control targets and reference torques. Through hardware experiments on the Booster T1 humanoid, HybridMimic reduces the average base position tracking error by 13\% compared to a state-of-the-art RL baseline, demonstrating the robustness of dynamics-aware deployment.
Paper Structure (22 sections, 17 equations, 5 figures, 5 tables)

This paper contains 22 sections, 17 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Snapshots of Booster T1 executing the kicking task using the proposed HybridMimic controller. (a) The robot starts standing. (b) The robot walks forward. (c) The robot kicks with its left foot. (d) The robot recovers and stabilizes from the kick and steps backwards. More motion can be found in the supplementary video. https://youtu.be/1d5vkqNtCOY
  • Figure 2: Diagram illustrating HybridMimic controller formulation in both training and deployment. The dashed items represent training specific elements that are unused during deployment.
  • Figure 3: The left two images illustrate the robot displacement after two sidesteps for (a) BeyondMimic and (b) HybridMimic controllers. (c) The projected base position $\hat{p}$ is the base position projected onto the line from the starting to ending position in the reference motion clip. While the real HybridMimic trajectory oscillates around the trained motion, the real BeyondMimic trajectory consistently undershoots it and induces trunk wobbling as shown in the experiment video.
  • Figure 4: The vertical ground reaction force of left foot during walking in simulation for (a) HybridMimic and (b) HybridMimic+FCS. The highlighted regions indicate periods of time during which the contact state indicates the left foot is contacting the ground. For HybridMimic, the highlighting occurs when $w_{\text{left foot}} >0$ and for HybridMimic+FCS , the highlighting occurs when the reference motion has a foot height below 0.007m.
  • Figure 5: Feedforward, actual, and reference torque during walking for the left and right knee pitch joints. The highlighted region corresponds to contact on the corresponding leg determined by when $w_i > 0$.