Table of Contents
Fetching ...

Discovery of skill switching criteria for learning agile quadruped locomotion

Wanming Yu, Fernando Acero, Vassil Atanassov, Chuanyu Yang, Ioannis Havoutis, Dimitrios Kanoulas, Zhibin Li

TL;DR

The paper tackles learning and coordinating multiple agile quadruped locomotion skills with automatic transitions and fall recovery for goal tracking. It introduces a hierarchical framework that pre-trains five single-skill policies and combines them via a gating network into a multiplicative Gaussian composite policy, while an outer CMA-ES loop discovers gait-switch criteria based on relative goal distance. Key contributions include enabling high-speed galloping within a learned multi-skill policy, automatic discovery of gait-switch thresholds, and robust real-world fall recovery demonstrated on a Unitree A1, outpacing baselines with manual skill switching. The approach reduces reliance on reference trajectories or expert demonstrations and offers a scalable path toward more dynamic, autonomous legged locomotion with natural transitions and resilience in real-world settings.

Abstract

This paper develops a hierarchical learning and optimization framework that can learn and achieve well-coordinated multi-skill locomotion. The learned multi-skill policy can switch between skills automatically and naturally in tracking arbitrarily positioned goals and recover from failures promptly. The proposed framework is composed of a deep reinforcement learning process and an optimization process. First, the contact pattern is incorporated into the reward terms for learning different types of gaits as separate policies without the need for any other references. Then, a higher level policy is learned to generate weights for individual policies to compose multi-skill locomotion in a goal-tracking task setting. Skills are automatically and naturally switched according to the distance to the goal. The proper distances for skill switching are incorporated in reward calculation for learning the high level policy and updated by an outer optimization loop as learning progresses. We first demonstrated successful multi-skill locomotion in comprehensive tasks on a simulated Unitree A1 quadruped robot. We also deployed the learned policy in the real world showcasing trotting, bounding, galloping, and their natural transitions as the goal position changes. Moreover, the learned policy can react to unexpected failures at any time, perform prompt recovery, and resume locomotion successfully. Compared to discrete switch between single skills which failed to transition to galloping in the real world, our proposed approach achieves all the learned agile skills, with smoother and more continuous skill transitions.

Discovery of skill switching criteria for learning agile quadruped locomotion

TL;DR

The paper tackles learning and coordinating multiple agile quadruped locomotion skills with automatic transitions and fall recovery for goal tracking. It introduces a hierarchical framework that pre-trains five single-skill policies and combines them via a gating network into a multiplicative Gaussian composite policy, while an outer CMA-ES loop discovers gait-switch criteria based on relative goal distance. Key contributions include enabling high-speed galloping within a learned multi-skill policy, automatic discovery of gait-switch thresholds, and robust real-world fall recovery demonstrated on a Unitree A1, outpacing baselines with manual skill switching. The approach reduces reliance on reference trajectories or expert demonstrations and offers a scalable path toward more dynamic, autonomous legged locomotion with natural transitions and resilience in real-world settings.

Abstract

This paper develops a hierarchical learning and optimization framework that can learn and achieve well-coordinated multi-skill locomotion. The learned multi-skill policy can switch between skills automatically and naturally in tracking arbitrarily positioned goals and recover from failures promptly. The proposed framework is composed of a deep reinforcement learning process and an optimization process. First, the contact pattern is incorporated into the reward terms for learning different types of gaits as separate policies without the need for any other references. Then, a higher level policy is learned to generate weights for individual policies to compose multi-skill locomotion in a goal-tracking task setting. Skills are automatically and naturally switched according to the distance to the goal. The proper distances for skill switching are incorporated in reward calculation for learning the high level policy and updated by an outer optimization loop as learning progresses. We first demonstrated successful multi-skill locomotion in comprehensive tasks on a simulated Unitree A1 quadruped robot. We also deployed the learned policy in the real world showcasing trotting, bounding, galloping, and their natural transitions as the goal position changes. Moreover, the learned policy can react to unexpected failures at any time, perform prompt recovery, and resume locomotion successfully. Compared to discrete switch between single skills which failed to transition to galloping in the real world, our proposed approach achieves all the learned agile skills, with smoother and more continuous skill transitions.

Paper Structure

This paper contains 18 sections, 9 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Coordination and gait transitions in quadrupedal animal and robot on demand of increasing speed. (a) Cheetah's changing gaits at increasing speed. (b) A1 quadruped robot's fall recovery, trotting, bounding and galloping skills using our multi-skill policy.
  • Figure 2: Proposed multi-skill learning and optimization framework. (a) Optimizing gait switch criteria in the outer loop of deep reinforcement learning. (b) Neural network architecture of a multi-skill policy. Bold arrows are the input or output outside the policy, while normal arrows are the internal input or output.
  • Figure 3: Foot contact patterns.
  • Figure 4: Normalized relative goal command in robot heading frame is given to encourage fall recovery, trotting, bounding, galloping and their transitions.
  • Figure 5: Results of CMA-ES optimization for gait switch criteria in learning multi-skill locomotion. (a) Best cost during CMA-ES optimization. (b) Optimized relative distance for switching from trotting to bounding. (c) Optimized relative distance for switching from bounding to galloping.
  • ...and 5 more figures