Table of Contents
Fetching ...

Whole-Body Model-Predictive Control of Legged Robots with MuJoCo

John Z. Zhang, Taylor A. Howell, Zeji Yi, Chaoyi Pan, Guanya Shi, Guannan Qu, Tom Erez, Yuval Tassa, Zachary Manchester

Abstract

We demonstrate the surprising real-world effectiveness of a very simple approach to whole-body model-predictive control (MPC) of quadruped and humanoid robots: the iterative LQR (iLQR) algorithm with MuJoCo dynamics and finite-difference approximated derivatives. Building upon the previous success of model-based behavior synthesis and control of locomotion and manipulation tasks with MuJoCo in simulation, we show that these policies can easily generalize to the real world with few sim-to-real considerations. Our baseline method achieves real-time whole-body MPC on a variety of hardware experiments, including dynamic quadruped locomotion, quadruped walking on two legs, and full-sized humanoid bipedal locomotion. We hope this easy-to-reproduce hardware baseline lowers the barrier to entry for real-world whole-body MPC research and contributes to accelerating research velocity in the community. Our code and experiment videos will be available online at:https://johnzhang3.github.io/mujoco_ilqr

Whole-Body Model-Predictive Control of Legged Robots with MuJoCo

Abstract

We demonstrate the surprising real-world effectiveness of a very simple approach to whole-body model-predictive control (MPC) of quadruped and humanoid robots: the iterative LQR (iLQR) algorithm with MuJoCo dynamics and finite-difference approximated derivatives. Building upon the previous success of model-based behavior synthesis and control of locomotion and manipulation tasks with MuJoCo in simulation, we show that these policies can easily generalize to the real world with few sim-to-real considerations. Our baseline method achieves real-time whole-body MPC on a variety of hardware experiments, including dynamic quadruped locomotion, quadruped walking on two legs, and full-sized humanoid bipedal locomotion. We hope this easy-to-reproduce hardware baseline lowers the barrier to entry for real-world whole-body MPC research and contributes to accelerating research velocity in the community. Our code and experiment videos will be available online at:https://johnzhang3.github.io/mujoco_ilqr

Paper Structure

This paper contains 19 sections, 8 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: A Unitree Go1 quadruped robot transitions from quadruped to handstand mode (top row) and walking on its hind legs (bottom row) using the MuJoCo iLQR policy.
  • Figure 2: System diagram for deploying the MuJoCo iLQR policy to the Unitree Quadruped and Humanoid robots. The iLQR algorithm provides control, state, and time-varying LQR (TV-LQR) feedback gain trajectories at $50$ Hz. The TV-LQR feedback policy can then be updated at $300$ Hz and passed to a joint-level PD controller. The robot's state is estimated by fusing onboard joint encoders and motion capture data. Live state estimates are updated in the planner at $300-500$ Hz and visualized in the MuJoCo MPC GUI. The cost categories are designed offline but the relative weights, goal locations, and iLQR hyperparameters can be adjusted by the user interactively in real time through the GUI.
  • Figure 3: The MuJoCo MPC GUI for deploying legged robots on hardware. This GUI enables the user to interactively control the real-world robot by changing the target position defined as the green sphere. Additionally, the user can update the planner agent parameters, and observe the simulated states and the real-world robot behaviors in real time.
  • Figure 4: Top left: thigh joint control trajectories of two impratio contact settings on a quadruped robot standing in place. The default setting results in nonphysical foot slipping and jerky controls (red line) that are potentially dangerous on hardware. Top right: Increased impratio prevents this issue (blue line) but increases compute times (top right). $1$ standard deviation confidence interval for each bar is shown in black. Bottom: timing breakdowns of different iLQR components for each setting.
  • Figure 5: Cost comparison between the iLQR policy with (blue) and without (red) TV-LQR feedback gains applied to the nominal control sequence on an H1 humanoid robot trotting task on hardware. The TV-LQR policy improves task performance by $\sim 30\%$.
  • ...and 1 more figures