Table of Contents
Fetching ...

VOCALoco: Viability-Optimized Cost-aware Adaptive Locomotion

Stanley Wu, Mohamad H. Danesh, Simon Li, Hanna Yurchyk, Amin Abyaneh, Anas El Houssaini, David Meger, Hsiu-Chin Lin

TL;DR

VOCALoco addresses safety and interpretability gaps in end-to-end DRL legged locomotion by introducing a modular, perception-driven framework that predicts viability and Cost of Transport (CoT) for a set of pre-trained policies. A high-level decision module filters unsafe options and selects the most energy-efficient viable policy based on local heightfield observations, with all predictors trained in simulation. The approach yields improved robustness and safety during stair ascent/descent and is validated through zero-shot real-world deployment on ANYmal-D, showing practical viability for scalable, interpretable locomotion in unstructured terrains. This work advances toward flexible, perception-guided skill switching in quadruped robots with potential for broad deployment across complex terrains and tasks.

Abstract

Recent advancements in legged robot locomotion have facilitated traversal over increasingly complex terrains. Despite this progress, many existing approaches rely on end-to-end deep reinforcement learning (DRL), which poses limitations in terms of safety and interpretability, especially when generalizing to novel terrains. To overcome these challenges, we introduce VOCALoco, a modular skill-selection framework that dynamically adapts locomotion strategies based on perceptual input. Given a set of pre-trained locomotion policies, VOCALoco evaluates their viability and energy-consumption by predicting both the safety of execution and the anticipated cost of transport over a fixed planning horizon. This joint assessment enables the selection of policies that are both safe and energy-efficient, given the observed local terrain. We evaluate our approach on staircase locomotion tasks, demonstrating its performance in both simulated and real-world scenarios using a quadrupedal robot. Empirical results show that VOCALoco achieves improved robustness and safety during stair ascent and descent compared to a conventional end-to-end DRL policy

VOCALoco: Viability-Optimized Cost-aware Adaptive Locomotion

TL;DR

VOCALoco addresses safety and interpretability gaps in end-to-end DRL legged locomotion by introducing a modular, perception-driven framework that predicts viability and Cost of Transport (CoT) for a set of pre-trained policies. A high-level decision module filters unsafe options and selects the most energy-efficient viable policy based on local heightfield observations, with all predictors trained in simulation. The approach yields improved robustness and safety during stair ascent/descent and is validated through zero-shot real-world deployment on ANYmal-D, showing practical viability for scalable, interpretable locomotion in unstructured terrains. This work advances toward flexible, perception-guided skill switching in quadruped robots with potential for broad deployment across complex terrains and tasks.

Abstract

Recent advancements in legged robot locomotion have facilitated traversal over increasingly complex terrains. Despite this progress, many existing approaches rely on end-to-end deep reinforcement learning (DRL), which poses limitations in terms of safety and interpretability, especially when generalizing to novel terrains. To overcome these challenges, we introduce VOCALoco, a modular skill-selection framework that dynamically adapts locomotion strategies based on perceptual input. Given a set of pre-trained locomotion policies, VOCALoco evaluates their viability and energy-consumption by predicting both the safety of execution and the anticipated cost of transport over a fixed planning horizon. This joint assessment enables the selection of policies that are both safe and energy-efficient, given the observed local terrain. We evaluate our approach on staircase locomotion tasks, demonstrating its performance in both simulated and real-world scenarios using a quadrupedal robot. Empirical results show that VOCALoco achieves improved robustness and safety during stair ascent and descent compared to a conventional end-to-end DRL policy

Paper Structure

This paper contains 27 sections, 1 equation, 9 figures.

Figures (9)

  • Figure 1: Overview of VOCALoco. Given a heightmap of the local terrain, two high-level modules predict: (i) the viability and (ii) the Cost-of-Transport (CoT) of executing each skill of executing each skill. With both predictions at hand, we first filter unsafe skills. Then, among the safe skills, we select the skill with the lowest predicted energy expenditure as the final policy to execute on the robot. The three example images show the ANYmal-D robot switching between different policies depending on the terrain type.
  • Figure 2: In VOCALoco, we start by training low-level locomotion policies: a walking policy, an ascending policy, and a descending policy. Then, we perform rollouts with these policies, collecting data that will train the high-level policies: the viability and the CoT modules.
  • Figure 3: Simulation terrains. (1) An example of a low-level locomotion policy training environment. The red dots denote the boundaries of the terrain cell that each robot cannot cross. (2) Policy rollouts on different terrain types to gather data for the high-level modules. (3) Descending staircase environment. (4) Ascending staircase environment. (5, 6) Evaluation environments. The terrains with transitions from flat to stairs and vice versa. The green dot represents the spawn position of the robot, and the star represents the target. (7) Rough terrain. (8) Stairs up. (9) Stairs down. (10) Discrete terrain.
  • Figure 4: Visualization of (middle) simulated environments with various difficulty levels and the collected (top) viability and (bottom) CoT data from rolling out the descent policy across different difficulty levels. The shaded region represents the standard deviation.
  • Figure 5: The collected and the predicted viability across terrains with different step heights for (Top) ascending staircase environment (Figure \ref{['fig:sim']}) and (Bottom) descending staircase environment (Figure \ref{['fig:sim']}).
  • ...and 4 more figures