Table of Contents
Fetching ...

Embrace Collisions: Humanoid Shadowing for Deployable Contact-Agnostics Motions

Ziwen Zhuang, Hang Zhao

TL;DR

This work reframes humanoid control to embrace full-body collisions, enabling extreme contact-rich motions beyond standing and walking. It introduces a general motion-command framework trained in GPU-accelerated simulation, using a transformer-based encoder, advantage mixing with multiple critics, and a termination policy suited for arbitrary base rotations. The approach is validated in simulation and deployed onboard a real Unitree G1, achieving successful get-up, ground interactions, and standing-dance movements with robust performance. The results highlight practical relevance for deployable, contact-agnostic humanoid motions and point to data and modeling gaps as avenues for future work.

Abstract

Previous humanoid robot research works treat the robot as a bipedal mobile manipulation platform, where only the feet and hands contact the environment. However, we humans use all body parts to interact with the world, e.g., we sit in chairs, get up from the ground, or roll on the floor. Contacting the environment using body parts other than feet and hands brings significant challenges in both model-predictive control and reinforcement learning-based methods. An unpredictable contact sequence makes it almost impossible for model-predictive control to plan ahead in real time. The success of the zero-shot sim-to-real reinforcement learning method for humanoids heavily depends on the acceleration of GPU-based rigid-body physical simulator and simplification of the collision detection. Lacking extreme torso movement of the humanoid research makes all other components non-trivial to design, such as termination conditions, motion commands and reward designs. To address these potential challenges, we propose a general humanoid motion framework that takes discrete motion commands and controls the robot's motor action in real time. Using a GPU-accelerated rigid-body simulator, we train a humanoid whole-body control policy that follows the high-level motion command in the real world in real time, even with stochastic contacts and extremely large robot base rotation and not-so-feasible motion command. More details at https://project-instinct.github.io

Embrace Collisions: Humanoid Shadowing for Deployable Contact-Agnostics Motions

TL;DR

This work reframes humanoid control to embrace full-body collisions, enabling extreme contact-rich motions beyond standing and walking. It introduces a general motion-command framework trained in GPU-accelerated simulation, using a transformer-based encoder, advantage mixing with multiple critics, and a termination policy suited for arbitrary base rotations. The approach is validated in simulation and deployed onboard a real Unitree G1, achieving successful get-up, ground interactions, and standing-dance movements with robust performance. The results highlight practical relevance for deployable, contact-agnostic humanoid motions and point to data and modeling gaps as avenues for future work.

Abstract

Previous humanoid robot research works treat the robot as a bipedal mobile manipulation platform, where only the feet and hands contact the environment. However, we humans use all body parts to interact with the world, e.g., we sit in chairs, get up from the ground, or roll on the floor. Contacting the environment using body parts other than feet and hands brings significant challenges in both model-predictive control and reinforcement learning-based methods. An unpredictable contact sequence makes it almost impossible for model-predictive control to plan ahead in real time. The success of the zero-shot sim-to-real reinforcement learning method for humanoids heavily depends on the acceleration of GPU-based rigid-body physical simulator and simplification of the collision detection. Lacking extreme torso movement of the humanoid research makes all other components non-trivial to design, such as termination conditions, motion commands and reward designs. To address these potential challenges, we propose a general humanoid motion framework that takes discrete motion commands and controls the robot's motor action in real time. Using a GPU-accelerated rigid-body simulator, we train a humanoid whole-body control policy that follows the high-level motion command in the real world in real time, even with stochastic contacts and extremely large robot base rotation and not-so-feasible motion command. More details at https://project-instinct.github.io

Paper Structure

This paper contains 27 sections, 8 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Training Framework: We build an extreme-action dataset from AMASS dataset and internet videos using 4D-Human goel2023humans. We retarget the human motion to the joint-level target of the Unitree G1 robot. We then feed the motion command as a sequence to a Transformer-based encoder. Concatenated with a stack of history proprioception observation (with no linear velocity), we use a sequence of MLP layers to output the joint-level action. In the simulator, a PD controller is used to compute the torque for each joint motor.
  • Figure 2: Histogram of the number of motions in terms of their maximum roll/pitch and the minimum base height.
  • Figure 3: Inconsistency example of humanoid getting up from laying on the ground. The transform frame in the figures are the target base orientation as the motion command.
  • Figure 4: To handle a variable of input motion command, the command encoder adopts a Transformer-based encoder. We select the embedding whose source has the closest "time-left to target" value. Then, we concatenate the embedding with the stack of history proprioception and feed it to an MLP to acquire the action output.
  • Figure 5: We run the control policy (including the Transformer-based encoder) using Nvidia Jetson NX inside of Unitree G1. We use an additional laptop to serve as another hardware that sends high-level motion commands, which are visualized in the bottom right of the figure. No motion capture system is used.
  • ...and 3 more figures