Experience-Learning Inspired Two-Step Reward Method for Efficient Legged Locomotion Learning Towards Natural and Robust Gaits
Yinghui Li, Jinze Wu, Xin Liu, Weizhong Guo, Yufei Xue
TL;DR
This work tackles the challenge of enabling natural and robust legged locomotion in complex terrains by introducing a two-stage, bioinspired learning framework. Stage I uses gait-related rewards to learn flat-terrain velocity tracking and to generate self-collected motion data, while Stage II leverages adversarial motion priors to guide learning on challenging terrains via experience-guided rewards. A teacher-student deployment pipeline distills the Stage II policy into a hardware-ready student, enabling real-world transfer to the Unitree Go1; experiments show natural diagonal gaits and robustness across varied terrains, including stairs. The approach reduces manual reward engineering, demonstrates domain randomization effectiveness, and offers a scalable path to robust locomotion for diverse legged robots and potential extensions to other robotic platforms.
Abstract
Multi-legged robots offer enhanced stability in complex terrains, yet autonomously learning natural and robust motions in such environments remains challenging. Drawing inspiration from animals' progressive learning patterns, from simple to complex tasks, we introduce a universal two-stage learning framework with two-step reward setting based on self-acquired experience, which efficiently enables legged robots to incrementally learn natural and robust movements. In the first stage, robots learn through gait-related rewards to track velocity on flat terrain, acquiring natural, robust movements and generating effective motion experience data. In the second stage, mirroring animal learning from existing experiences, robots learn to navigate challenging terrains with natural and robust movements using adversarial imitation learning. To demonstrate our method's efficacy, we trained both quadruped robots and a hexapod robot, and the policy were successfully transferred to a physical quadruped robot GO1, which exhibited natural gait patterns and remarkable robustness in various terrains.
