Table of Contents
Fetching ...

HiLo: Learning Whole-Body Human-like Locomotion with Motion Tracking Controller

Qiyuan Zhang, Chenfan Weng, Guanwu Li, Fulai He, Yusheng Cai

TL;DR

HiLo tackles the challenge of human-like locomotion by learning a motion-tracking controller that mimics a reference gait while employing a residual RL policy to correct deviations. The framework combines simple domain randomization via random force injection and action delay with a distributional value function $Z^{\\pi}(s)$ to stabilize training under perturbations. Key contributions include a lightweight yet effective domain randomization strategy, the use of a distributional return to accelerate learning, and zero-shot transfer with motion-pattern adjustments achieved through a residual mechanism without fine-tuning, validated in both high-fidelity simulation and real-world GR1 hardware. The results demonstrate natural, robust locomotion and practical adaptability for human-centric tasks, highlighting HiLo’s potential to reduce reward-design effort and improve sim-to-real performance in humanoid control.

Abstract

Deep Reinforcement Learning (RL) has emerged as a promising method to develop humanoid robot locomotion controllers. Despite the robust and stable locomotion demonstrated by previous RL controllers, their behavior often lacks the natural and agile motion patterns necessary for human-centric scenarios. In this work, we propose HiLo (human-like locomotion with motion tracking), an effective framework designed to learn RL policies that perform human-like locomotion. The primary challenges of human-like locomotion are complex reward engineering and domain randomization. HiLo overcomes these issues by developing an RL-based motion tracking controller and simple domain randomization through random force injection and action delay. Within the framework of HiLo, the whole-body control problem can be decomposed into two components: One part is solved using an open-loop control method, while the residual part is addressed with RL policies. A distributional value function is also implemented to stabilize the training process by improving the estimation of cumulative rewards under perturbed dynamics. Our experiments demonstrate that the motion tracking controller trained using HiLo can perform natural and agile human-like locomotion while exhibiting resilience to external disturbances in real-world systems. Furthermore, we show that the motion patterns of humanoid robots can be adapted through the residual mechanism without fine-tuning, allowing quick adjustments to task requirements.

HiLo: Learning Whole-Body Human-like Locomotion with Motion Tracking Controller

TL;DR

HiLo tackles the challenge of human-like locomotion by learning a motion-tracking controller that mimics a reference gait while employing a residual RL policy to correct deviations. The framework combines simple domain randomization via random force injection and action delay with a distributional value function to stabilize training under perturbations. Key contributions include a lightweight yet effective domain randomization strategy, the use of a distributional return to accelerate learning, and zero-shot transfer with motion-pattern adjustments achieved through a residual mechanism without fine-tuning, validated in both high-fidelity simulation and real-world GR1 hardware. The results demonstrate natural, robust locomotion and practical adaptability for human-centric tasks, highlighting HiLo’s potential to reduce reward-design effort and improve sim-to-real performance in humanoid control.

Abstract

Deep Reinforcement Learning (RL) has emerged as a promising method to develop humanoid robot locomotion controllers. Despite the robust and stable locomotion demonstrated by previous RL controllers, their behavior often lacks the natural and agile motion patterns necessary for human-centric scenarios. In this work, we propose HiLo (human-like locomotion with motion tracking), an effective framework designed to learn RL policies that perform human-like locomotion. The primary challenges of human-like locomotion are complex reward engineering and domain randomization. HiLo overcomes these issues by developing an RL-based motion tracking controller and simple domain randomization through random force injection and action delay. Within the framework of HiLo, the whole-body control problem can be decomposed into two components: One part is solved using an open-loop control method, while the residual part is addressed with RL policies. A distributional value function is also implemented to stabilize the training process by improving the estimation of cumulative rewards under perturbed dynamics. Our experiments demonstrate that the motion tracking controller trained using HiLo can perform natural and agile human-like locomotion while exhibiting resilience to external disturbances in real-world systems. Furthermore, we show that the motion patterns of humanoid robots can be adapted through the residual mechanism without fine-tuning, allowing quick adjustments to task requirements.

Paper Structure

This paper contains 15 sections, 23 equations, 9 figures.

Figures (9)

  • Figure 1: The overview of the proposed motion tracking pipeline. The RL policy takes the current state $s_t$ and the desired goal $g_t$ as inputs, and then outputs the RL action $a^{RL}_t$. This RL action is then combined with the open-loop control $a^{OC}_t$ to produce the final action $a_t$. The target joint position $a_t$ is subsequently converted into the torques applied to the joints.
  • Figure 2: Without early termination, the humanoid robot tends to reach the maximum position of its actuation motors when tracking the reference.
  • Figure 3: We propose a simple domain randomization method that incorporates ERFI and action delay into the training environment.
  • Figure 4: Neural networks are used to parameterize the policy and the distributional value function. MLP refers to the multi-layer perceptron, the non-crossing quantile module has the same architecture as that of previous work zhou2020non.
  • Figure 5: Performance of the motion tracking controller on the high-fidelity simulation Webots.
  • ...and 4 more figures