Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing

Haoru Xue; Chaoyi Pan; Zeji Yi; Guannan Qu; Guanya Shi

Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing

Haoru Xue, Chaoyi Pan, Zeji Yi, Guannan Qu, Guanya Shi

TL;DR

This work tackles real-time full-order torque-level control for legged locomotion by reframing MPPI as a diffusion process and introducing diffusion-inspired annealing (DIAL-MPC). It implements a dual-loop covariance strategy with trajectory-level and action-level annealing to balance exploration and convergence within a receding-horizon MPC. Empirical results on a quadruped demonstrate substantial improvements over MPPI, CMA-ES, NMPC, and RL baselines, including dramatic reductions in tracking error and robust performance under payloads and model mismatch, all without training. The approach offers training-free online optimization for complex locomotion tasks, though it relies on fast simulation, with future work aimed at improving sample efficiency via learned models and nominal policies.

Abstract

Due to high dimensionality and non-convexity, real-time optimal control using full-order dynamics models for legged robots is challenging. Therefore, Nonlinear Model Predictive Control (NMPC) approaches are often limited to reduced-order models. Sampling-based MPC has shown potential in nonconvex even discontinuous problems, but often yields suboptimal solutions with high variance, which limits its applications in high-dimensional locomotion. This work introduces DIAL-MPC (Diffusion-Inspired Annealing for Legged MPC), a sampling-based MPC framework with a novel diffusion-style annealing process. Such an annealing process is supported by the theoretical landscape analysis of Model Predictive Path Integral Control (MPPI) and the connection between MPPI and single-step diffusion. Algorithmically, DIAL-MPC iteratively refines solutions online and achieves both global coverage and local convergence. In quadrupedal torque-level control tasks, DIAL-MPC reduces the tracking error of standard MPPI by $13.4$ times and outperforms reinforcement learning (RL) policies by $50\%$ in challenging climbing tasks without any training. In particular, DIAL-MPC enables precise real-world quadrupedal jumping with payload. To the best of our knowledge, DIAL-MPC is the first training-free method that optimizes over full-order quadruped dynamics in real-time.

Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing

TL;DR

Abstract

times and outperforms reinforcement learning (RL) policies by

in challenging climbing tasks without any training. In particular, DIAL-MPC enables precise real-world quadrupedal jumping with payload. To the best of our knowledge, DIAL-MPC is the first training-free method that optimizes over full-order quadruped dynamics in real-time.

Paper Structure (17 sections, 1 theorem, 11 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 17 sections, 1 theorem, 11 equations, 9 figures, 6 tables, 1 algorithm.

INTRODUCTION
RELATED WORK
Agility in Legged Locomotion
Sampling-Based Optimization
Parallel Robot Simulation
METHOD
Sampling-Based MPC as Single-stage Diffusion
Diffusion-Inspired Annealing
Diffusion-Inspired Annealing for Sampling-Based MPC
EXPERIMENT
Convergence and Coverage
Test-Time Generalizability
Robustness to Model Mismatch
CONCLUSION
APPENDIX
...and 2 more sections

Key Result

Proposition 1

The MPPI update eq:mppi can be viewed as a one-step ascent with the score function $\nabla \log p_1(U)$ with a learning rate $\Sigma$:

Figures (9)

Figure 1: Diffusion-inspired annealing for legged MPC (DIAL-MPC). To achieve both global coverage and local convergence, DIAL-MPC involves a bi-level diffusion-inspired annealing process. Trajectory-wise annealing is performed with different sampling variance. Action-wise annealing is performed on control input at different horizion. Over time, $u_H$ will be gradually refined by the two diffusion-inspired annealing processes, leading to a robust and efficient full-order online control.
Figure 2: Cost function $J(U)$ and target distribution $p_0(U)$ for a task where robot need to jump over a wall. The cost function could be highly non-convex and non-smooth due to the contact constraint. The resulting distribution $p_0(U)$ is also non-convex and sparse, which is hard to sample from.
Figure 3: Forward density function in diffusion process.
Figure 4: Coverage and convergence trade-off in sampling-based methods. Given the target distribution $p_0(U)$ and the same number of samples, MPPI either over-explores or over-exploits the solution, while our method balances the exploration and exploitation to converge to optima following a diffusion-inspired annealing process.
Figure 5: Top: the coverage of DIAL-MPC in crate-climbing and humanoid jogging task. The crate-climbing task requires the robot to climb up a crate more than two times higher than itself. The full-size humanoid jogging task demands DIAL-MPC to handle a higher-dimensional action space of 19. Middle: DIAL-MPC controlling a humanoid pushing a 30kg crate. Bottom: DIAL-MPC generates a motion strategy with less effort when reducing the crate's weight to 15kg.
...and 4 more figures

Theorems & Definitions (2)

Proposition 1: Adopted from panModelBasedDiffusionTrajectory2024
proof

Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing

TL;DR

Abstract

Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (2)