MTAC: Hierarchical Reinforcement Learning-based Multi-gait Terrain-adaptive Quadruped Controller
Nishaant Shah, Kshitij Tiwari, Aniket Bera
TL;DR
MTAC addresses robust quadruped locomotion on rough urban terrain using hierarchical reinforcement learning. A high-level policy selects among low-level gait experts—each PPO-trained for a terrain-specific curriculum (bumpy, stair pits/pyramids, steps)—to produce emergent, terrain-specific locomotion. The high-level decision is formulated as a Markov decision process (MDP) with $(A,S,p,r)$, where $A \in \mathbb{R}^3$ and $S \in \mathbb{R}^{15}$, enabling continuous selection signals. Results show higher task completion rates and faster navigation than a generalized PPO baseline, with strong performance under high velocity and high terrain difficulty, illustrating practical impact for urban search and rescue.
Abstract
Urban search and rescue missions require rapid first response to minimize loss of life and damage. Often, such efforts are assisted by humanitarian robots which need to handle dynamic operational conditions such as uneven and rough terrains, especially during mass casualty incidents like an earthquake. Quadruped robots, owing to their versatile design, have the potential to assist in such scenarios. However, control of quadruped robots in dynamic and rough terrain environments is a challenging problem due to the many degrees of freedom of these robots. Current locomotion controllers for quadrupeds are limited in their ability to produce multiple adaptive gaits, solve tasks in a time and resource-efficient manner, and require tedious training and manual tuning procedures. To address these challenges, we propose MTAC: a multi-gait terrain-adaptive controller, which utilizes a Hierarchical reinforcement learning (HRL) approach while being time and memory-efficient. We show that our proposed method scales well to a diverse range of environments with similar compute times as state-of-the-art methods. Our method showed greater than 75% on most tasks, outperforming previous work on the majority of test cases.
