Diffusion Model Predictive Control
Guangyao Zhou, Sivaramakrishnan Swaminathan, Rajkumar Vasudeva Raju, J. Swaroop Guntupalli, Wolfgang Lehrach, Joseph Ortiz, Antoine Dedieu, Miguel Lázaro-Gredilla, Kevin Murphy
TL;DR
D-MPC advances model-based planning by learning multi-step diffusion-based dynamics and action proposals from offline data, enabling robust online MPC with horizon-based planning. By combining trajectory-level diffusion models with a simple sampling-based planner and a Transformer-based value estimator, it mitigates compounding errors and supports runtime adaptation to novel rewards and dynamics. Empirical results on D4RL show strong performance against MBOP and competitive standing with SOTA methods, with clear ablations confirming the value of multi-step diffusion, task adaptation, and diffusion-based proposals. The approach also demonstrates potential for fast policy distillation to enable high-frequency control, while acknowledging runtime and data distribution limitations inherent to offline RL.
Abstract
We propose Diffusion Model Predictive Control (D-MPC), a novel MPC approach that learns a multi-step action proposal and a multi-step dynamics model, both using diffusion models, and combines them for use in online MPC. On the popular D4RL benchmark, we show performance that is significantly better than existing model-based offline planning methods using MPC (e.g. MBOP) and competitive with state-of-the-art (SOTA) model-based and model-free reinforcement learning methods. We additionally illustrate D-MPC's ability to optimize novel reward functions at run time and adapt to novel dynamics, and highlight its advantages compared to existing diffusion-based planning baselines.
