Model-Based Diffusion for Trajectory Optimization

Chaoyi Pan; Zeji Yi; Guanya Shi; Guannan Qu

Model-Based Diffusion for Trajectory Optimization

Chaoyi Pan, Zeji Yi, Guanya Shi, Guannan Qu

TL;DR

This work tackles trajectory optimization for complex, non-smooth dynamics by introducing Model-Based Diffusion (MBD), which uses a dynamics model to estimate the score and iteratively refine trajectories via diffusion. Unlike model-free diffusion, MBD leverages the known dynamics to steer sampling toward dynamically feasible, cost-effective trajectories and can incorporate demonstrations of varying quality through adaptive weighting. Empirical results show MBD surpasses state-of-the-art RL and sampling-based TO methods on challenging contact-rich tasks and benefits from data augmentation with imperfect demonstrations, enabling more robust and practical planning. The approach offers a principled, data-efficient alternative to purely data-driven diffusion methods and opens up avenues for theory, online planning, and enhanced sampling techniques.

Abstract

Recent advances in diffusion models have demonstrated their strong capabilities in generating high-fidelity samples from complex distributions through an iterative refinement process. Despite the empirical success of diffusion models in motion planning and control, the model-free nature of these methods does not leverage readily available model information and limits their generalization to new scenarios beyond the training data (e.g., new robots with different dynamics). In this work, we introduce Model-Based Diffusion (MBD), an optimization approach using the diffusion process to solve trajectory optimization (TO) problems without data. The key idea is to explicitly compute the score function by leveraging the model information in TO problems, which is why we refer to our approach as model-based diffusion. Moreover, although MBD does not require external data, it can be naturally integrated with data of diverse qualities to steer the diffusion process. We also reveal that MBD has interesting connections to sampling-based optimization. Empirical evaluations show that MBD outperforms state-of-the-art reinforcement learning and sampling-based TO methods in challenging contact-rich tasks. Additionally, MBD's ability to integrate with data enhances its versatility and practical applicability, even with imperfect and infeasible data (e.g., partial-state demonstrations for high-dimensional humanoids), beyond the scope of standard diffusion models.

Model-Based Diffusion for Trajectory Optimization

TL;DR

Abstract

Paper Structure (21 sections, 3 theorems, 35 equations, 6 figures, 6 tables, 2 algorithms)

This paper contains 21 sections, 3 theorems, 35 equations, 6 figures, 6 tables, 2 algorithms.

Introduction
Related Work
Problem Statement and Background
Model-Based Diffusion
Model-based Diffusion as Multi-stage Optimization
Model-based Diffusion for Trajectory Optimization
Model-based Diffusion with Demonstration
Experimental Results
MBD for Planning in Contact-rich Tasks
Data-augmented MBD for Trajectory Optimization
Conclusion and Future Work
Appendix / Supplemental Material
Convergence of Distribution with Small Temperature
Black-box Optimization with MBD
MBD for DNN Training without Gradient Information
...and 6 more sections

Key Result

Proposition 2

Given the target distribution $\boldsymbol{Y} \sim p(\cdot)$ with $P(Y) \propto \exp\left(-\frac{J(Y)}{\lambda}\right), Y \in \mathcal{R}^d$, where $J$ is a cost function with $\min_Y J(Y) = 0$ and $Y^* = \mathop{\mathrm{arg\,min}}\limits J(Y)$, and assuming that the volume function $V_J(t)$ is boun where $Poly_l(t) = \sum_{k=0}^M c^l_k t^{\alpha_k}$ and $Poly_u(t) = \sum_{k=0}^M c^u_k t^{\alpha_k

Figures (6)

Figure 1: MBD refines the trajectory by leveraging the dynamics model directly without relying on demonstration data.
Figure 2: Reverse SDE vs. Monte Carlo score ascent (MCSA) on a synthetic highly non-convex objective function. (a) Synthesized objective function with multiple local minima. (b) The intermediate stage density $p_{i}(\cdot)$, where peaked $p_{0}(\cdot)$ is iteratively corrupted to a Gaussian $p_{N}(\cdot)$. (c) Reverse SDE vs. MCSA: Background colors represent the density of $p_{i}(\cdot)$ at different stages. MCSA converges faster due to larger step size and lower sampling noise while still capturing the multimodality.
Figure 3: Optimization process of MBD on the (a) Humanoid Standup, (b) Push T, and (c) Humanoid Running tasks. The trajectory is iteratively refined to achieve the desired objective in the high-dimensional space with model information.
Figure 4: MBD optimized trajectory with data augmentation on the (a) Humanoid Jogging and (b) Car UMaze Navigation tasks. With data augmentation, the trajectory is regularized and refined to achieve the desired objective.
Figure 5: Performance of MBD on high-dimensional black-box optimization benchmarks. MBD outperforms other Gaussian Process-based Bayesian Optimization methods by a clear margin.
...and 1 more figures

Theorems & Definitions (7)

Definition 1
Proposition 2
proof
Definition 3
Proposition 4
proof
Proposition 5

Model-Based Diffusion for Trajectory Optimization

TL;DR

Abstract

Model-Based Diffusion for Trajectory Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (7)