Table of Contents
Fetching ...

Benchmarking Model Predictive Control and Reinforcement Learning Based Control for Legged Robot Locomotion in MuJoCo Simulation

Shivayogi Akki, Tan Chen

TL;DR

This study benchmarked MPC and RL controllers for legged locomotion on the Unitree Go1 in MuJoCo, focusing on straight walking at $0.5~m/s$. It shows RL achieves superior disturbance rejection and lower CoT, aided by high-frequency actions and knee-driven propulsion, while MPC offers more stable recovery from large perturbations through balanced joint utilization. However, RL generalizes poorly to slippery and uneven terrains, indicating a sim-to-real and robustness gap. The results highlight a fundamental trade-off and motivate hybrid or domain-randomized approaches to combine robustness with efficiency for practical legged robotics.

Abstract

Model Predictive Control (MPC) and Reinforcement Learning (RL) are two prominent strategies for controlling legged robots, each with unique strengths. RL learns control policies through system interaction, adapting to various scenarios, whereas MPC relies on a predefined mathematical model to solve optimization problems in real-time. Despite their widespread use, there is a lack of direct comparative analysis under standardized conditions. This work addresses this gap by benchmarking MPC and RL controllers on a Unitree Go1 quadruped robot within the MuJoCo simulation environment, focusing on a standardized task-straight walking at a constant velocity. Performance is evaluated based on disturbance rejection, energy efficiency, and terrain adaptability. The results show that RL excels in handling disturbances and maintaining energy efficiency but struggles with generalization to new terrains due to its dependence on learned policies tailored to specific environments. In contrast, MPC shows enhanced recovery capabilities from larger perturbations by leveraging its optimization-based approach, allowing for a balanced distribution of control efforts across the robot's joints. The results provide a clear understanding of the advantages and limitations of both RL and MPC, offering insights into selecting an appropriate control strategy for legged robotic applications.

Benchmarking Model Predictive Control and Reinforcement Learning Based Control for Legged Robot Locomotion in MuJoCo Simulation

TL;DR

This study benchmarked MPC and RL controllers for legged locomotion on the Unitree Go1 in MuJoCo, focusing on straight walking at . It shows RL achieves superior disturbance rejection and lower CoT, aided by high-frequency actions and knee-driven propulsion, while MPC offers more stable recovery from large perturbations through balanced joint utilization. However, RL generalizes poorly to slippery and uneven terrains, indicating a sim-to-real and robustness gap. The results highlight a fundamental trade-off and motivate hybrid or domain-randomized approaches to combine robustness with efficiency for practical legged robotics.

Abstract

Model Predictive Control (MPC) and Reinforcement Learning (RL) are two prominent strategies for controlling legged robots, each with unique strengths. RL learns control policies through system interaction, adapting to various scenarios, whereas MPC relies on a predefined mathematical model to solve optimization problems in real-time. Despite their widespread use, there is a lack of direct comparative analysis under standardized conditions. This work addresses this gap by benchmarking MPC and RL controllers on a Unitree Go1 quadruped robot within the MuJoCo simulation environment, focusing on a standardized task-straight walking at a constant velocity. Performance is evaluated based on disturbance rejection, energy efficiency, and terrain adaptability. The results show that RL excels in handling disturbances and maintaining energy efficiency but struggles with generalization to new terrains due to its dependence on learned policies tailored to specific environments. In contrast, MPC shows enhanced recovery capabilities from larger perturbations by leveraging its optimization-based approach, allowing for a balanced distribution of control efforts across the robot's joints. The results provide a clear understanding of the advantages and limitations of both RL and MPC, offering insights into selecting an appropriate control strategy for legged robotic applications.

Paper Structure

This paper contains 18 sections, 11 equations, 4 figures.

Figures (4)

  • Figure 1: A. Reinforcement learning proximal policy optimization algorithm. B. Model predictive control framework using predictive sampling.
  • Figure 2: A. shows position and velocity plots for the Go1 robot with both MPC and RL controllers during standardized task, and next, the control inputs. B., C., and D. illustrate the robot's position and velocity response to perturbations in the positive $x$-axis, negative $x$-axis, and negative $y$-axis, respectively, under both controllers. The graphs demonstrate differences in disturbance rejection capabilities, showing how each controller stabilizes the robot after external forces are applied. The XFRC, marked by the red circle, indicates the perturbation force and timestep when the perturbation is applied.
  • Figure 3: A. Go1 Robot Front Right Limb: Illustration of the limb configuration showing hip abduction, hip flexion, and knee flexion joints. B. Perturbation in Positive x-axis: Free body diagram showing the forces and torques on the robot's joints when a perturbation force is applied in the positive x direction of the CoM, leading to instability at the rear foot. C. Perturbation in Negative x-axis: Free body diagram illustrating the forces and torques on the robot's joints when a perturbation force is applied in the negative x direction of the CoM, stabilizing the robot as it pushes down on the pivotal point.
  • Figure 4: The first row shows the Go1 robot's response when controlled by RL on slippery terrain, indicating reduced traction and slower acceleration. The second row presents the robot's response on uneven terrain, demonstrating challenges in maintaining stable movement. Both sets of plots compare position, velocity, and joint torques to illustrate the RL controller's adaptability to varying terrain conditions. The legend shows standardized flat friction terrain (STD) and slippery or uneven friction terrain (RL).