Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing
Haoru Xue, Chaoyi Pan, Zeji Yi, Guannan Qu, Guanya Shi
TL;DR
This work tackles real-time full-order torque-level control for legged locomotion by reframing MPPI as a diffusion process and introducing diffusion-inspired annealing (DIAL-MPC). It implements a dual-loop covariance strategy with trajectory-level and action-level annealing to balance exploration and convergence within a receding-horizon MPC. Empirical results on a quadruped demonstrate substantial improvements over MPPI, CMA-ES, NMPC, and RL baselines, including dramatic reductions in tracking error and robust performance under payloads and model mismatch, all without training. The approach offers training-free online optimization for complex locomotion tasks, though it relies on fast simulation, with future work aimed at improving sample efficiency via learned models and nominal policies.
Abstract
Due to high dimensionality and non-convexity, real-time optimal control using full-order dynamics models for legged robots is challenging. Therefore, Nonlinear Model Predictive Control (NMPC) approaches are often limited to reduced-order models. Sampling-based MPC has shown potential in nonconvex even discontinuous problems, but often yields suboptimal solutions with high variance, which limits its applications in high-dimensional locomotion. This work introduces DIAL-MPC (Diffusion-Inspired Annealing for Legged MPC), a sampling-based MPC framework with a novel diffusion-style annealing process. Such an annealing process is supported by the theoretical landscape analysis of Model Predictive Path Integral Control (MPPI) and the connection between MPPI and single-step diffusion. Algorithmically, DIAL-MPC iteratively refines solutions online and achieves both global coverage and local convergence. In quadrupedal torque-level control tasks, DIAL-MPC reduces the tracking error of standard MPPI by $13.4$ times and outperforms reinforcement learning (RL) policies by $50\%$ in challenging climbing tasks without any training. In particular, DIAL-MPC enables precise real-world quadrupedal jumping with payload. To the best of our knowledge, DIAL-MPC is the first training-free method that optimizes over full-order quadruped dynamics in real-time.
