Table of Contents
Fetching ...

Learning Terrain-Specialized Policies for Adaptive Locomotion in Challenging Environments

Matheus P. Angarola, Francisco Affonso, Marcelo Becker

TL;DR

The paper tackles blind legged locomotion across unstructured terrains by introducing a hierarchical RL framework that employs terrain-specialized policies and curriculum learning. A privileged terrain observer enables a policy selector to route control to the most appropriate expert, while a terrain generator and curriculum expand agility across diverse velocity commands. Experimental results in high-fidelity simulation show significant gains in success rate and velocity-tracking accuracy, especially on low-friction or discontinuous terrains, demonstrating improved robustness over a generalist policy. The work advances practical adaptive locomotion by decomposing the problem into terrain-specific subtasks and providing a systematic training curriculum, with future work focused on removing privileged cues and enabling sim-to-real transfer.

Abstract

Legged robots must exhibit robust and agile locomotion across diverse, unstructured terrains, a challenge exacerbated under blind locomotion settings where terrain information is unavailable. This work introduces a hierarchical reinforcement learning framework that leverages terrain-specialized policies and curriculum learning to enhance agility and tracking performance in complex environments. We validated our method on simulation, where our approach outperforms a generalist policy by up to 16% in success rate and achieves lower tracking errors as the velocity target increases, particularly on low-friction and discontinuous terrains, demonstrating superior adaptability and robustness across mixed-terrain scenarios.

Learning Terrain-Specialized Policies for Adaptive Locomotion in Challenging Environments

TL;DR

The paper tackles blind legged locomotion across unstructured terrains by introducing a hierarchical RL framework that employs terrain-specialized policies and curriculum learning. A privileged terrain observer enables a policy selector to route control to the most appropriate expert, while a terrain generator and curriculum expand agility across diverse velocity commands. Experimental results in high-fidelity simulation show significant gains in success rate and velocity-tracking accuracy, especially on low-friction or discontinuous terrains, demonstrating improved robustness over a generalist policy. The work advances practical adaptive locomotion by decomposing the problem into terrain-specific subtasks and providing a systematic training curriculum, with future work focused on removing privileged cues and enabling sim-to-real transfer.

Abstract

Legged robots must exhibit robust and agile locomotion across diverse, unstructured terrains, a challenge exacerbated under blind locomotion settings where terrain information is unavailable. This work introduces a hierarchical reinforcement learning framework that leverages terrain-specialized policies and curriculum learning to enhance agility and tracking performance in complex environments. We validated our method on simulation, where our approach outperforms a generalist policy by up to 16% in success rate and achieves lower tracking errors as the velocity target increases, particularly on low-friction and discontinuous terrains, demonstrating superior adaptability and robustness across mixed-terrain scenarios.

Paper Structure

This paper contains 13 sections, 5 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Hierarchical locomotion control architecture based on terrain-specialized policies. The system selects an appropriate expert policy based on the perceived terrain to execute the desired locomotion behavior. The blue arrow $(\rightarrow)$ represents the robot’s velocity direction.
  • Figure 2: System overview. (A) Training Policy: An RL policy receives a short history of proprioceptive observations, the previous action, and the commanded velocities. The reward blends command-tracking terms with penalties for high kinematic jerk and unstable contacts. A discrete curriculum grid over $(v_x^{\mathrm{cmd}},\ \omega_z^{\mathrm{cmd}})$ is sampled; when a cell's error $\lVert v^{\mathrm{cmd}}-v^{\mathrm{base}}\rVert < \epsilon$, its neighbors unlock, increasing difficulty. Trained policies are appended to a specialized policy library, each stored with its associated privileged observation $z$. (B) Policy Selector: A deterministic mapper parses the privileged vector $z$ to extract the relevant cues and directly outputs the index $i$ of the policy to run in the library; the selected $\pi_i$ is applied until a new selection is triggered. (C) Terrain Generation: A parametric generator procedurally varies geometric and frictional properties to synthesize a set of challenging terrains.
  • Figure 3: Comparison of a) Generalist and b) Specialist policy.
  • Figure 4: Training curves of average return against simulation steps: generalist vs. five specialized policies.
  • Figure 5: Unlocked velocity commands refer to those the robot successfully tracked with error below the threshold $\epsilon$.
  • ...and 3 more figures