Table of Contents
Fetching ...

MTAC: Hierarchical Reinforcement Learning-based Multi-gait Terrain-adaptive Quadruped Controller

Nishaant Shah, Kshitij Tiwari, Aniket Bera

TL;DR

MTAC addresses robust quadruped locomotion on rough urban terrain using hierarchical reinforcement learning. A high-level policy selects among low-level gait experts—each PPO-trained for a terrain-specific curriculum (bumpy, stair pits/pyramids, steps)—to produce emergent, terrain-specific locomotion. The high-level decision is formulated as a Markov decision process (MDP) with $(A,S,p,r)$, where $A \in \mathbb{R}^3$ and $S \in \mathbb{R}^{15}$, enabling continuous selection signals. Results show higher task completion rates and faster navigation than a generalized PPO baseline, with strong performance under high velocity and high terrain difficulty, illustrating practical impact for urban search and rescue.

Abstract

Urban search and rescue missions require rapid first response to minimize loss of life and damage. Often, such efforts are assisted by humanitarian robots which need to handle dynamic operational conditions such as uneven and rough terrains, especially during mass casualty incidents like an earthquake. Quadruped robots, owing to their versatile design, have the potential to assist in such scenarios. However, control of quadruped robots in dynamic and rough terrain environments is a challenging problem due to the many degrees of freedom of these robots. Current locomotion controllers for quadrupeds are limited in their ability to produce multiple adaptive gaits, solve tasks in a time and resource-efficient manner, and require tedious training and manual tuning procedures. To address these challenges, we propose MTAC: a multi-gait terrain-adaptive controller, which utilizes a Hierarchical reinforcement learning (HRL) approach while being time and memory-efficient. We show that our proposed method scales well to a diverse range of environments with similar compute times as state-of-the-art methods. Our method showed greater than 75% on most tasks, outperforming previous work on the majority of test cases.

MTAC: Hierarchical Reinforcement Learning-based Multi-gait Terrain-adaptive Quadruped Controller

TL;DR

MTAC addresses robust quadruped locomotion on rough urban terrain using hierarchical reinforcement learning. A high-level policy selects among low-level gait experts—each PPO-trained for a terrain-specific curriculum (bumpy, stair pits/pyramids, steps)—to produce emergent, terrain-specific locomotion. The high-level decision is formulated as a Markov decision process (MDP) with , where and , enabling continuous selection signals. Results show higher task completion rates and faster navigation than a generalized PPO baseline, with strong performance under high velocity and high terrain difficulty, illustrating practical impact for urban search and rescue.

Abstract

Urban search and rescue missions require rapid first response to minimize loss of life and damage. Often, such efforts are assisted by humanitarian robots which need to handle dynamic operational conditions such as uneven and rough terrains, especially during mass casualty incidents like an earthquake. Quadruped robots, owing to their versatile design, have the potential to assist in such scenarios. However, control of quadruped robots in dynamic and rough terrain environments is a challenging problem due to the many degrees of freedom of these robots. Current locomotion controllers for quadrupeds are limited in their ability to produce multiple adaptive gaits, solve tasks in a time and resource-efficient manner, and require tedious training and manual tuning procedures. To address these challenges, we propose MTAC: a multi-gait terrain-adaptive controller, which utilizes a Hierarchical reinforcement learning (HRL) approach while being time and memory-efficient. We show that our proposed method scales well to a diverse range of environments with similar compute times as state-of-the-art methods. Our method showed greater than 75% on most tasks, outperforming previous work on the majority of test cases.
Paper Structure (18 sections, 7 figures, 3 tables, 3 algorithms)

This paper contains 18 sections, 7 figures, 3 tables, 3 algorithms.

Figures (7)

  • Figure 1: The proposed controller solves rough terrain navigation using hierarchical learning. A high-level policy is trained to act over a family of pre-trained low-level experts trained to execute unique gaits. Together, these policies are able to execute adaptive locomotion on unstructured terrains
  • Figure 2: Shown above is MTAC's pre-trained stair expert successfully navigating a difficult grade stair pyramid, as a model trained through the previous work's generalized curriculum falls on a lower difficulty level.
  • Figure 3: The methodology for MTAC is pictured above. At the highest level, is the input of a high-level action into the controller. Based on this command, the high-level policy then selects a low-level action to be executed by one of the 3 expert policies. The training process for these experts is also pictured. Then the robot executes the low-level action and interacts with the environment. These contacts then start the next iteration of the feedback loop.
  • Figure 4: Expert policies are developed by training in specialized environments that consist of focused terrain types. Here we show three environments: bumpy terrain (top left), stair pits (top right), stair pyramids (bottom left) and steps (bottom right). All of these terrain types are parameterized so that difficulty can be gradually varied across the map.
  • Figure 5: Here is shown the performance of the MTAC stepping expert during training. It shows how the body velocities converge closely to the commanded velocities. Similarly, the DOF positions also converge to match the desired positions.
  • ...and 2 more figures