Table of Contents
Fetching ...

Experience-Learning Inspired Two-Step Reward Method for Efficient Legged Locomotion Learning Towards Natural and Robust Gaits

Yinghui Li, Jinze Wu, Xin Liu, Weizhong Guo, Yufei Xue

TL;DR

This work tackles the challenge of enabling natural and robust legged locomotion in complex terrains by introducing a two-stage, bioinspired learning framework. Stage I uses gait-related rewards to learn flat-terrain velocity tracking and to generate self-collected motion data, while Stage II leverages adversarial motion priors to guide learning on challenging terrains via experience-guided rewards. A teacher-student deployment pipeline distills the Stage II policy into a hardware-ready student, enabling real-world transfer to the Unitree Go1; experiments show natural diagonal gaits and robustness across varied terrains, including stairs. The approach reduces manual reward engineering, demonstrates domain randomization effectiveness, and offers a scalable path to robust locomotion for diverse legged robots and potential extensions to other robotic platforms.

Abstract

Multi-legged robots offer enhanced stability in complex terrains, yet autonomously learning natural and robust motions in such environments remains challenging. Drawing inspiration from animals' progressive learning patterns, from simple to complex tasks, we introduce a universal two-stage learning framework with two-step reward setting based on self-acquired experience, which efficiently enables legged robots to incrementally learn natural and robust movements. In the first stage, robots learn through gait-related rewards to track velocity on flat terrain, acquiring natural, robust movements and generating effective motion experience data. In the second stage, mirroring animal learning from existing experiences, robots learn to navigate challenging terrains with natural and robust movements using adversarial imitation learning. To demonstrate our method's efficacy, we trained both quadruped robots and a hexapod robot, and the policy were successfully transferred to a physical quadruped robot GO1, which exhibited natural gait patterns and remarkable robustness in various terrains.

Experience-Learning Inspired Two-Step Reward Method for Efficient Legged Locomotion Learning Towards Natural and Robust Gaits

TL;DR

This work tackles the challenge of enabling natural and robust legged locomotion in complex terrains by introducing a two-stage, bioinspired learning framework. Stage I uses gait-related rewards to learn flat-terrain velocity tracking and to generate self-collected motion data, while Stage II leverages adversarial motion priors to guide learning on challenging terrains via experience-guided rewards. A teacher-student deployment pipeline distills the Stage II policy into a hardware-ready student, enabling real-world transfer to the Unitree Go1; experiments show natural diagonal gaits and robustness across varied terrains, including stairs. The approach reduces manual reward engineering, demonstrates domain randomization effectiveness, and offers a scalable path to robust locomotion for diverse legged robots and potential extensions to other robotic platforms.

Abstract

Multi-legged robots offer enhanced stability in complex terrains, yet autonomously learning natural and robust motions in such environments remains challenging. Drawing inspiration from animals' progressive learning patterns, from simple to complex tasks, we introduce a universal two-stage learning framework with two-step reward setting based on self-acquired experience, which efficiently enables legged robots to incrementally learn natural and robust movements. In the first stage, robots learn through gait-related rewards to track velocity on flat terrain, acquiring natural, robust movements and generating effective motion experience data. In the second stage, mirroring animal learning from existing experiences, robots learn to navigate challenging terrains with natural and robust movements using adversarial imitation learning. To demonstrate our method's efficacy, we trained both quadruped robots and a hexapod robot, and the policy were successfully transferred to a physical quadruped robot GO1, which exhibited natural gait patterns and remarkable robustness in various terrains.
Paper Structure (19 sections, 3 equations, 5 figures, 5 tables)

This paper contains 19 sections, 3 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Our approach develops a hardware-robust policy, equipping legged robots with neural network control to achieve stable and naturally robust gaits across diverse terrains. In the top part of our testing, hexapod and quadruped robots like HEX, Unitree-Go1, Go2, and B2 showcase the effectiveness of our trained controllers in producing natural, robust diagonal gaits, even in challenging settings like staircases. In the bottom part, we validate the transferability of our training results by successfully applying the trained strategies to the real robot Go1, exemplifying our method's practical applicability.
  • Figure 2: Our method comprises two main stages: rewards-rewards induced learning for simple tasks and experience-reward induced learning for rough tasks, culminating in deployment on real robots using a teacher-student strategy. In the first stage, the robot is trained to track velocity commands with a diagonal gait in a flat terrain environment. We incorporate gait-related reward functions to effectively constrain the robot's gait, foot trajectory, and body state, enabling it to achieve a natural and robust diagonal gait. After training, the network generates motion state data specific to the task, storing experiences such as the robot's body state (linear and angular velocity) and joint states (position and velocity). In the second stage, the robot need track velocity commands with a diagonal gait in complex environments. Additional privileged information like terrain data, body linear velocity, and dynamic parameters are fed into the network as observations. The robot's previously acquired motion experiences serve as a reference, training a discriminator network to identify similarities between current tasks and past experiences, and to generate style reward signals. These are combined with task rewards and regularization rewards to update the actor and critic networks. During deployment, the teacher-student method is used to encode privileged information from proprioceptive sensing, facilitating successful implementation on real robots.
  • Figure 3: The variation in terrain difficulty during training with the same random seed under different rewards indicates the robot's learning speed for effective motions. Basic rewards combined with well-scaled Experience-Guided rewards enhance motion sampling. However, manually set gait rewards hinder effective learning, leading to minimal increases in terrain difficulty. This demonstrates the effectiveness of Experience-Guided rewards in improving learning in complex terrains.
  • Figure 4: Comparison of Velocity Tracking Performance and Gait on 20cm Stairs: Evaluating Robot Control with Policies Trained Using Experience-Guided Rewards (ER) Versus Without.
  • Figure 5: Naturally Robust trot gait in physic robot Go1. The blue icon indicates the support phase and the red icon the swing phase, demonstrating the robot's consistent diagonal gait on stairs of varying heights.