Table of Contents
Fetching ...

Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments

Gangyang Li, Qing Shi, Youhao Hu, Jincheng Hu, Zhongyuan Wang, Xinlong Wang, Shaqi Luo

TL;DR

Thor tackles the challenge of enabling humanoids to perform high-intensity force interactions by decoupling the whole-body control into upper, waist, and lower policies, guided by a FAT2 reward anchored in ZMP-based force equilibria. It employs a three-agent PPO-based RL framework with privileged critic information, a two-stage curriculum, and domain randomization to bridge the sim-to-real gap, validated on the Unitree G1 where it substantially outperforms baselines in pulling tasks and door-opening scenarios. The combination of FAT2 and a decoupled architecture addresses both force amplification and high-dimensionality, enabling real-time inference on onboard resources. This work advances humanoid robustness in contact-rich tasks and provides a practical path toward human-level whole-body reactions in unstructured environments.

Abstract

Humanoids hold great potential for service, industrial, and rescue applications, in which robots must sustain whole-body stability while performing intense, contact-rich interactions with the environment. However, enabling humanoids to generate human-like, adaptive responses under such conditions remains a major challenge. To address this, we propose Thor, a humanoid framework for human-level whole-body reactions in contact-rich environments. Based on the robot's force analysis, we design a force-adaptive torso-tilt (FAT2) reward function to encourage humanoids to exhibit human-like responses during force-interaction tasks. To mitigate the high-dimensional challenges of humanoid control, Thor introduces a reinforcement learning architecture that decouples the upper body, waist, and lower body. Each component shares global observations of the whole body and jointly updates its parameters. Finally, we deploy Thor on the Unitree G1, and it substantially outperforms baselines in force-interaction tasks. Specifically, the robot achieves a peak pulling force of 167.7 N (approximately 48% of the G1's body weight) when moving backward and 145.5 N when moving forward, representing improvements of 68.9% and 74.7%, respectively, compared with the best-performing baseline. Moreover, Thor is capable of pulling a loaded rack (130 N) and opening a fire door with one hand (60 N). These results highlight Thor's effectiveness in enhancing humanoid force-interaction capabilities.

Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments

TL;DR

Thor tackles the challenge of enabling humanoids to perform high-intensity force interactions by decoupling the whole-body control into upper, waist, and lower policies, guided by a FAT2 reward anchored in ZMP-based force equilibria. It employs a three-agent PPO-based RL framework with privileged critic information, a two-stage curriculum, and domain randomization to bridge the sim-to-real gap, validated on the Unitree G1 where it substantially outperforms baselines in pulling tasks and door-opening scenarios. The combination of FAT2 and a decoupled architecture addresses both force amplification and high-dimensionality, enabling real-time inference on onboard resources. This work advances humanoid robustness in contact-rich tasks and provides a practical path toward human-level whole-body reactions in unstructured environments.

Abstract

Humanoids hold great potential for service, industrial, and rescue applications, in which robots must sustain whole-body stability while performing intense, contact-rich interactions with the environment. However, enabling humanoids to generate human-like, adaptive responses under such conditions remains a major challenge. To address this, we propose Thor, a humanoid framework for human-level whole-body reactions in contact-rich environments. Based on the robot's force analysis, we design a force-adaptive torso-tilt (FAT2) reward function to encourage humanoids to exhibit human-like responses during force-interaction tasks. To mitigate the high-dimensional challenges of humanoid control, Thor introduces a reinforcement learning architecture that decouples the upper body, waist, and lower body. Each component shares global observations of the whole body and jointly updates its parameters. Finally, we deploy Thor on the Unitree G1, and it substantially outperforms baselines in force-interaction tasks. Specifically, the robot achieves a peak pulling force of 167.7 N (approximately 48% of the G1's body weight) when moving backward and 145.5 N when moving forward, representing improvements of 68.9% and 74.7%, respectively, compared with the best-performing baseline. Moreover, Thor is capable of pulling a loaded rack (130 N) and opening a fire door with one hand (60 N). These results highlight Thor's effectiveness in enhancing humanoid force-interaction capabilities.

Paper Structure

This paper contains 13 sections, 19 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Humanoids performing tasks involving forceful interactions with the environment: (a) opening a fire door with one hand, requiring approximately 60 $N$ of pulling force; (b) pulling a rack loaded with a 70 $kg$ weight, requiring approximately 130 $N$ of force; (c) pushing a wheelchair carrying a 60 $kg$ robot to make a turn; (d) wiping a whiteboard with one hand. https://baai-aether.github.io/baai-thor/
  • Figure 2: Pipeline of Thor. The whole-body control strategy for humanoids is decoupled into a network architecture comprising the upper body, waist, and lower body, with each component equipped with its own Actor-Critic network structure. The Critic network incorporates privileged information inputs, including the magnitude and direction of forces experienced by the EEs. Additionally, FAT2 is introduced to encourage the robot to respond in a human-like manner during force interactions with the environment. During training, the upper body is encouraged to track motions from a human motion dataset. During deployment, the actor network serves as the policy network, receiving motion commands from a remote controller and desired upper body motions derived from virtual reality (VR) through inverse kinematics. The desired positions of the whole-body joints are processed through a PD controller to generate the output joint torques.
  • Figure 3: Humanoid force interaction analysis with ZMP constraint.
  • Figure 4: Sequential plots of the robot’s posture and the corresponding interactive force in the simulation environment: (a) backward motion, (b) forward motion.
  • Figure 5: The variation of the pulling force generated by the robot with respect to the torso tilt angle.