RL-augmented MPC Framework for Agile and Robust Bipedal Footstep Locomotion Planning and Control
Seung Hyeon Bang, Carlos Arribalzaga Jové, Luis Sentis
TL;DR
This work introduces an RL-augmented MPC framework for bipedal footstep planning that couples an ALIP-based MPC planner with a residual RL policy to compensate for full-order dynamics. By forming a hierarchical HL planner and LL WBC, the approach achieves agile velocity tracking, reliable turning, and robustness to disturbances, while enabling multiple replans during swing. PPO-trained residuals guide footstep refinements around the MPC solution, providing sample-efficient learning and improved adaptability to challenging terrains such as slopes and sloppy surfaces. The method demonstrates significant performance gains over a baseline ALIP-MPC on the DRACO 3 humanoid in simulation, highlighting the practical impact of integrating model-based planning with model-free learning for dynamic locomotion.
Abstract
This paper proposes an online bipedal footstep planning strategy that combines model predictive control (MPC) and reinforcement learning (RL) to achieve agile and robust bipedal maneuvers. While MPC-based foot placement controllers have demonstrated their effectiveness in achieving dynamic locomotion, their performance is often limited by the use of simplified models and assumptions. To address this challenge, we develop a novel foot placement controller that leverages a learned policy to bridge the gap between the use of a simplified model and the more complex full-order robot system. Specifically, our approach employs a unique combination of an ALIP-based MPC foot placement controller for sub-optimal footstep planning and the learned policy for refining footstep adjustments, enabling the resulting footstep policy to capture the robot's whole-body dynamics effectively. This integration synergizes the predictive capability of MPC with the flexibility and adaptability of RL. We validate the effectiveness of our framework through a series of experiments using the full-body humanoid robot DRACO 3. The results demonstrate significant improvements in dynamic locomotion performance, including better tracking of a wide range of walking speeds, enabling reliable turning and traversing challenging terrains while preserving the robustness and stability of the walking gaits compared to the baseline ALIP-based MPC approach.
