Table of Contents
Fetching ...

Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning

Yingnan Zhao, Xinmiao Wang, Dewei Wang, Xinzhe Liu, Dan Lu, Qilong Han, Peng Liu, Chenjia Bai

TL;DR

AHC tackles the challenge of achieving versatile humanoid locomotion by learning a single controller capable of standing up, walking, and recovering across diverse terrains. It introduces a two-stage framework: first distilling behavior-specific policies into a basic multi-behavior policy guided by Adversarial Motion Prior, then fine-tuning with reinforcement learning on varied terrains using gradient projection and a terrain curriculum. The approach yields a terrain-adaptive controller that transfers from simulation to a real Unitree G1, capable of recovering from falls and maintaining stable locomotion under disturbances. This work provides a practical path toward generalizable humanoid locomotion without training separate policies for every skill and terrain.

Abstract

Humanoid robots are promising to learn a diverse set of human-like locomotion behaviors, including standing up, walking, running, and jumping. However, existing methods predominantly require training independent policies for each skill, yielding behavior-specific controllers that exhibit limited generalization and brittle performance when deployed on irregular terrains and in diverse situations. To address this challenge, we propose Adaptive Humanoid Control (AHC) that adopts a two-stage framework to learn an adaptive humanoid locomotion controller across different skills and terrains. Specifically, we first train several primary locomotion policies and perform a multi-behavior distillation process to obtain a basic multi-behavior controller, facilitating adaptive behavior switching based on the environment. Then, we perform reinforced fine-tuning by collecting online feedback in performing adaptive behaviors on more diverse terrains, enhancing terrain adaptability for the controller. We conduct experiments in both simulation and real-world experiments in Unitree G1 robots. The results show that our method exhibits strong adaptability across various situations and terrains. Project website: https://ahc-humanoid.github.io.

Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning

TL;DR

AHC tackles the challenge of achieving versatile humanoid locomotion by learning a single controller capable of standing up, walking, and recovering across diverse terrains. It introduces a two-stage framework: first distilling behavior-specific policies into a basic multi-behavior policy guided by Adversarial Motion Prior, then fine-tuning with reinforcement learning on varied terrains using gradient projection and a terrain curriculum. The approach yields a terrain-adaptive controller that transfers from simulation to a real Unitree G1, capable of recovering from falls and maintaining stable locomotion under disturbances. This work provides a practical path toward generalizable humanoid locomotion without training separate policies for every skill and terrain.

Abstract

Humanoid robots are promising to learn a diverse set of human-like locomotion behaviors, including standing up, walking, running, and jumping. However, existing methods predominantly require training independent policies for each skill, yielding behavior-specific controllers that exhibit limited generalization and brittle performance when deployed on irregular terrains and in diverse situations. To address this challenge, we propose Adaptive Humanoid Control (AHC) that adopts a two-stage framework to learn an adaptive humanoid locomotion controller across different skills and terrains. Specifically, we first train several primary locomotion policies and perform a multi-behavior distillation process to obtain a basic multi-behavior controller, facilitating adaptive behavior switching based on the environment. Then, we perform reinforced fine-tuning by collecting online feedback in performing adaptive behaviors on more diverse terrains, enhancing terrain adaptability for the controller. We conduct experiments in both simulation and real-world experiments in Unitree G1 robots. The results show that our method exhibits strong adaptability across various situations and terrains. Project website: https://ahc-humanoid.github.io.

Paper Structure

This paper contains 27 sections, 9 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison between multi-task RL and our proposed framework. Directly learning multiple skills via multi-task RL is challenging. Therefore, we adopt a two-stage framework consisting of behavior distillation and reinforced fine-tuning, enabling the acquisition of diverse humanoid robot skills and generalization to complex terrains.
  • Figure 2: Overview of the proposed two-stage framework Adaptive Humanoid Control. In the first stage, we train two separate primary policies on flat terrain. These policies are then distilled into a basic multi-behavior policy via distillation. In the second stage, we perform reinforced fine-tuning on the distilled policy, employing gradient surgery to alleviate gradient conflicts and utilizing behavior-specific critics to provide more accurate value estimation.
  • Figure 3: Comparison of recovery motions under AHC and HoST. We compare our AHC (with AMP) against the HoST (w/o AMP) in both lying and prone scenarios. AHC produces smoother recovery behaviors. This highlights the effectiveness of AMP in guiding the learning of naturalistic recovery motions.
  • Figure 4: Joint acceleration analysis of the left leg during recovery. Acceleration profiles of hip and knee joints from the left leg illustrate that our AHC results in stable joint actuation, with notably fewer abrupt fluctuations compared to HoST.
  • Figure 5: Value loss curves during the second-stage fine-tuning. Policies equipped with behavior-specific critics (AHC-BC-w/o-PC and AHC) indicate more stable value learning compared to their shared-critic counterparts (AHC-SC).
  • ...and 2 more figures