PIP-Loco: A Proprioceptive Infinite Horizon Planning Framework for Quadrupedal Robot Locomotion
Aditya Shirwatkar, Naman Saxena, Kishore Chandra, Shishir Kolathaya
TL;DR
PIP-Loco introduces a proprioceptive infinite-horizon planning framework for quadruped locomotion that blends RL with a Dreamer-based internal model to enable long-horizon planning and constraint satisfaction. The training regime uses an asymmetric actor-critic setup where an expert actor and a velocity-aware internal model co-evolve, while deployment relies on Dreamer to solve an infinite-horizon MPC-like problem with hard constraints. Key findings show improved robustness to training noise, clearer interpretability via future-state dreaming, and successful hardware deployment on multi-terrain tests with units like the Unitree Go1. The work highlights the practical potential of integrating planning into RL for legged robots, while noting limitations related to planner-policy divergence and suggesting future directions such as safety filters and perception-driven planning.
Abstract
A core strength of Model Predictive Control (MPC) for quadrupedal locomotion has been its ability to enforce constraints and provide interpretability of the sequence of commands over the horizon. However, despite being able to plan, MPC struggles to scale with task complexity, often failing to achieve robust behavior on rapidly changing surfaces. On the other hand, model-free Reinforcement Learning (RL) methods have outperformed MPC on multiple terrains, showing emergent motions but inherently lack any ability to handle constraints or perform planning. To address these limitations, we propose a framework that integrates proprioceptive planning with RL, allowing for agile and safe locomotion behaviors through the horizon. Inspired by MPC, we incorporate an internal model that includes a velocity estimator and a Dreamer module. During training, the framework learns an expert policy and an internal model that are co-dependent, facilitating exploration for improved locomotion behaviors. During deployment, the Dreamer module solves an infinite-horizon MPC problem, adapting actions and velocity commands to respect the constraints. We validate the robustness of our training framework through ablation studies on internal model components and demonstrate improved robustness to training noise. Finally, we evaluate our approach across multi-terrain scenarios in both simulation and hardware.
