Table of Contents
Fetching ...

RL-augmented MPC Framework for Agile and Robust Bipedal Footstep Locomotion Planning and Control

Seung Hyeon Bang, Carlos Arribalzaga Jové, Luis Sentis

TL;DR

This work introduces an RL-augmented MPC framework for bipedal footstep planning that couples an ALIP-based MPC planner with a residual RL policy to compensate for full-order dynamics. By forming a hierarchical HL planner and LL WBC, the approach achieves agile velocity tracking, reliable turning, and robustness to disturbances, while enabling multiple replans during swing. PPO-trained residuals guide footstep refinements around the MPC solution, providing sample-efficient learning and improved adaptability to challenging terrains such as slopes and sloppy surfaces. The method demonstrates significant performance gains over a baseline ALIP-MPC on the DRACO 3 humanoid in simulation, highlighting the practical impact of integrating model-based planning with model-free learning for dynamic locomotion.

Abstract

This paper proposes an online bipedal footstep planning strategy that combines model predictive control (MPC) and reinforcement learning (RL) to achieve agile and robust bipedal maneuvers. While MPC-based foot placement controllers have demonstrated their effectiveness in achieving dynamic locomotion, their performance is often limited by the use of simplified models and assumptions. To address this challenge, we develop a novel foot placement controller that leverages a learned policy to bridge the gap between the use of a simplified model and the more complex full-order robot system. Specifically, our approach employs a unique combination of an ALIP-based MPC foot placement controller for sub-optimal footstep planning and the learned policy for refining footstep adjustments, enabling the resulting footstep policy to capture the robot's whole-body dynamics effectively. This integration synergizes the predictive capability of MPC with the flexibility and adaptability of RL. We validate the effectiveness of our framework through a series of experiments using the full-body humanoid robot DRACO 3. The results demonstrate significant improvements in dynamic locomotion performance, including better tracking of a wide range of walking speeds, enabling reliable turning and traversing challenging terrains while preserving the robustness and stability of the walking gaits compared to the baseline ALIP-based MPC approach.

RL-augmented MPC Framework for Agile and Robust Bipedal Footstep Locomotion Planning and Control

TL;DR

This work introduces an RL-augmented MPC framework for bipedal footstep planning that couples an ALIP-based MPC planner with a residual RL policy to compensate for full-order dynamics. By forming a hierarchical HL planner and LL WBC, the approach achieves agile velocity tracking, reliable turning, and robustness to disturbances, while enabling multiple replans during swing. PPO-trained residuals guide footstep refinements around the MPC solution, providing sample-efficient learning and improved adaptability to challenging terrains such as slopes and sloppy surfaces. The method demonstrates significant performance gains over a baseline ALIP-MPC on the DRACO 3 humanoid in simulation, highlighting the practical impact of integrating model-based planning with model-free learning for dynamic locomotion.

Abstract

This paper proposes an online bipedal footstep planning strategy that combines model predictive control (MPC) and reinforcement learning (RL) to achieve agile and robust bipedal maneuvers. While MPC-based foot placement controllers have demonstrated their effectiveness in achieving dynamic locomotion, their performance is often limited by the use of simplified models and assumptions. To address this challenge, we develop a novel foot placement controller that leverages a learned policy to bridge the gap between the use of a simplified model and the more complex full-order robot system. Specifically, our approach employs a unique combination of an ALIP-based MPC foot placement controller for sub-optimal footstep planning and the learned policy for refining footstep adjustments, enabling the resulting footstep policy to capture the robot's whole-body dynamics effectively. This integration synergizes the predictive capability of MPC with the flexibility and adaptability of RL. We validate the effectiveness of our framework through a series of experiments using the full-body humanoid robot DRACO 3. The results demonstrate significant improvements in dynamic locomotion performance, including better tracking of a wide range of walking speeds, enabling reliable turning and traversing challenging terrains while preserving the robustness and stability of the walking gaits compared to the baseline ALIP-based MPC approach.
Paper Structure (27 sections, 15 equations, 7 figures)

This paper contains 27 sections, 15 equations, 7 figures.

Figures (7)

  • Figure 1: Overview of the proposed control framework: The footstep planner consists of a simplified model-based model predictive controller (MPC) and a neural network (NN). Together, these components generate the footstep policy by integrating the solutions from each module. A whole-body feedback controller (WBC) then tracks the footstep policy.
  • Figure 2: ALIP-model limitation: Comparison of the angular momentum about the robot's contact point, $L_y$, angular momentum about its CoM ($L_y^c$) (left), and the predicted evolution $L_y^{\textrm{pred}}$, using the forward simulation of the ALIP model. The initial state of $L_y$ is taken at the time of the step transition, and the predicted evolution at the end of the step, $L_y^{\textrm{pred, end}}$, is shown at every time instance during foot swing (right), while the robot walks forward using ALIP-based MPC Gibson2022Terrain-AdaptiveConstraints and WBC Bang2023ControlBody. Notice that the blue and red lines are noticeably different and $L_y^{\textrm{pred, end}}$ fluctuates considerably.
  • Figure 3: Reinforcement learning framework: The agent learns a policy neural network (NN) in simulation, where the robot is controlled through the modules in the controller. The ALIP-based model predictive controller (MPC) follows a gait command and operates based on a time-based finite state machine (FSM), which manages the footstep timing. The ALIP-MPC process optimizes the desired footstep location, while the trajectory managers generate the swing foot trajectories based on the footstep locations. The trajectory managers also govern the desired task trajectories to harness the full-order model. All desired trajectories are sent to the whole-body controller (WBC) Bang2023ControlBody to generate joint commands. In this learning framework, the policy NN generating residual footstep locations is the agent, while everything in the closed-loop system is considered the environment.
  • Figure 4: Velocity tracking during forward walking: Comparison of the baseline MPC and our proposed MPC + RL method: (a) Lower frequency policy (b) Higher frequency policy
  • Figure 5: Force perturbation tests: At each evaluation episode, incremental perturbation forces (pushes) were applied until the robot lost balance. The blue curve represents the minimum force at which the robot failed to maintain balance. The orange curve represents the maximum force with which the robot consistently maintained balance when starting the evaluation episode with that force. Unlike the blue curve, independent of previous perturbations. (Top row): Low-frequency policy (Bottom row): High-frequency policy.
  • ...and 2 more figures