Table of Contents
Fetching ...

PIP-Loco: A Proprioceptive Infinite Horizon Planning Framework for Quadrupedal Robot Locomotion

Aditya Shirwatkar, Naman Saxena, Kishore Chandra, Shishir Kolathaya

TL;DR

PIP-Loco introduces a proprioceptive infinite-horizon planning framework for quadruped locomotion that blends RL with a Dreamer-based internal model to enable long-horizon planning and constraint satisfaction. The training regime uses an asymmetric actor-critic setup where an expert actor and a velocity-aware internal model co-evolve, while deployment relies on Dreamer to solve an infinite-horizon MPC-like problem with hard constraints. Key findings show improved robustness to training noise, clearer interpretability via future-state dreaming, and successful hardware deployment on multi-terrain tests with units like the Unitree Go1. The work highlights the practical potential of integrating planning into RL for legged robots, while noting limitations related to planner-policy divergence and suggesting future directions such as safety filters and perception-driven planning.

Abstract

A core strength of Model Predictive Control (MPC) for quadrupedal locomotion has been its ability to enforce constraints and provide interpretability of the sequence of commands over the horizon. However, despite being able to plan, MPC struggles to scale with task complexity, often failing to achieve robust behavior on rapidly changing surfaces. On the other hand, model-free Reinforcement Learning (RL) methods have outperformed MPC on multiple terrains, showing emergent motions but inherently lack any ability to handle constraints or perform planning. To address these limitations, we propose a framework that integrates proprioceptive planning with RL, allowing for agile and safe locomotion behaviors through the horizon. Inspired by MPC, we incorporate an internal model that includes a velocity estimator and a Dreamer module. During training, the framework learns an expert policy and an internal model that are co-dependent, facilitating exploration for improved locomotion behaviors. During deployment, the Dreamer module solves an infinite-horizon MPC problem, adapting actions and velocity commands to respect the constraints. We validate the robustness of our training framework through ablation studies on internal model components and demonstrate improved robustness to training noise. Finally, we evaluate our approach across multi-terrain scenarios in both simulation and hardware.

PIP-Loco: A Proprioceptive Infinite Horizon Planning Framework for Quadrupedal Robot Locomotion

TL;DR

PIP-Loco introduces a proprioceptive infinite-horizon planning framework for quadruped locomotion that blends RL with a Dreamer-based internal model to enable long-horizon planning and constraint satisfaction. The training regime uses an asymmetric actor-critic setup where an expert actor and a velocity-aware internal model co-evolve, while deployment relies on Dreamer to solve an infinite-horizon MPC-like problem with hard constraints. Key findings show improved robustness to training noise, clearer interpretability via future-state dreaming, and successful hardware deployment on multi-terrain tests with units like the Unitree Go1. The work highlights the practical potential of integrating planning into RL for legged robots, while noting limitations related to planner-policy divergence and suggesting future directions such as safety filters and perception-driven planning.

Abstract

A core strength of Model Predictive Control (MPC) for quadrupedal locomotion has been its ability to enforce constraints and provide interpretability of the sequence of commands over the horizon. However, despite being able to plan, MPC struggles to scale with task complexity, often failing to achieve robust behavior on rapidly changing surfaces. On the other hand, model-free Reinforcement Learning (RL) methods have outperformed MPC on multiple terrains, showing emergent motions but inherently lack any ability to handle constraints or perform planning. To address these limitations, we propose a framework that integrates proprioceptive planning with RL, allowing for agile and safe locomotion behaviors through the horizon. Inspired by MPC, we incorporate an internal model that includes a velocity estimator and a Dreamer module. During training, the framework learns an expert policy and an internal model that are co-dependent, facilitating exploration for improved locomotion behaviors. During deployment, the Dreamer module solves an infinite-horizon MPC problem, adapting actions and velocity commands to respect the constraints. We validate the robustness of our training framework through ablation studies on internal model components and demonstrate improved robustness to training noise. Finally, we evaluate our approach across multi-terrain scenarios in both simulation and hardware.
Paper Structure (20 sections, 3 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 3 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Conceptual overview of PIP-Loco: The framework plans actions and velocity commands by dreaming about future states and enforcing constraints. The top shows the potential future trajectories starting from the current state, while the bottom depicts the quadruped adapting to stairs and rocky terrain.
  • Figure 2: PIP-Loco Framework: (a) The internal model (comprising of a velocity estimator and Dreamer module) learns in a co-dependent way with the Asymmetric Actor-Critic. The Dreamer module facilitates temporal reasoning by dreaming about future observations and latent states, enhancing exploration for improved locomotion behaviors. (b) The Dreamer module solves an infinite-horizon MPC problem to generate actions $a_t$ for the robot, ensuring robust constraint handling and adaptive locomotion across terrains.
  • Figure 3: Training performance comparison: Mean return per episode over training iterations for PIP-Loco (NLM) at different horizons (H=1, H=5) compared to the baseline HIMLoco himloco. PIP-Loco (NLM) shows improved performance across all horizons.
  • Figure 4: Top Sequence: Robot navigating a 45 cm step-down obstacle, with phases of walking, stepping down, adapting, and recovering. The graph below tracks pitch and roll angles over time, with phases highlighted in green (Normal Walk), red (Step), orange (Adaptation), and green (Recovery).
  • Figure 5: Simulation results for a 10-second run with extreme command velocities: Top-left compares roll and pitch with and without planning. The top-right shows optimized velocity commands from Algorithm \ref{['alg:mpc']}. The bottom panel shows joint angle variations, with planning having lower peaks near constraint bounds.