Table of Contents
Fetching ...

What Makes a Good Diffusion Planner for Decision Making?

Haofei Lu, Dongqi Han, Yifei Shen, Dongsheng Li

TL;DR

This work tackles what makes a diffusion planner effective for offline RL by conducting large-scale empirical analyses across more than 6,000 diffusion models. It identifies key design components—guided sampling, denoising backbone, action generation, and planning strategy—and reveals counterintuitive findings, such as the superiority of unconditional sampling with selection and Transformer-based backbones over traditional choices. The authors introduce Diffusion Veteran (DV), a simple yet strong baseline that achieves state-of-the-art results on standard offline RL benchmarks and provide practical guidelines for diffusion planning. The study also demonstrates broader applicability through validations on the Adroit Hand dataset and discusses future directions, including the integration of planning and policy approaches and efficiency considerations.

Abstract

Diffusion models have recently shown significant potential in solving decision-making problems, particularly in generating behavior plans -- also known as diffusion planning. While numerous studies have demonstrated the impressive performance of diffusion planning, the mechanisms behind the key components of a good diffusion planner remain unclear and the design choices are highly inconsistent in existing studies. In this work, we address this issue through systematic empirical experiments on diffusion planning in an offline reinforcement learning (RL) setting, providing practical insights into the essential components of diffusion planning. We trained and evaluated over 6,000 diffusion models, identifying the critical components such as guided sampling, network architecture, action generation and planning strategy. We revealed that some design choices opposite to the common practice in previous work in diffusion planning actually lead to better performance, e.g., unconditional sampling with selection can be better than guided sampling and Transformer outperforms U-Net as denoising network. Based on these insights, we suggest a simple yet strong diffusion planning baseline that achieves state-of-the-art results on standard offline RL benchmarks.

What Makes a Good Diffusion Planner for Decision Making?

TL;DR

This work tackles what makes a diffusion planner effective for offline RL by conducting large-scale empirical analyses across more than 6,000 diffusion models. It identifies key design components—guided sampling, denoising backbone, action generation, and planning strategy—and reveals counterintuitive findings, such as the superiority of unconditional sampling with selection and Transformer-based backbones over traditional choices. The authors introduce Diffusion Veteran (DV), a simple yet strong baseline that achieves state-of-the-art results on standard offline RL benchmarks and provide practical guidelines for diffusion planning. The study also demonstrates broader applicability through validations on the Adroit Hand dataset and discusses future directions, including the integration of planning and policy approaches and efficiency considerations.

Abstract

Diffusion models have recently shown significant potential in solving decision-making problems, particularly in generating behavior plans -- also known as diffusion planning. While numerous studies have demonstrated the impressive performance of diffusion planning, the mechanisms behind the key components of a good diffusion planner remain unclear and the design choices are highly inconsistent in existing studies. In this work, we address this issue through systematic empirical experiments on diffusion planning in an offline reinforcement learning (RL) setting, providing practical insights into the essential components of diffusion planning. We trained and evaluated over 6,000 diffusion models, identifying the critical components such as guided sampling, network architecture, action generation and planning strategy. We revealed that some design choices opposite to the common practice in previous work in diffusion planning actually lead to better performance, e.g., unconditional sampling with selection can be better than guided sampling and Transformer outperforms U-Net as denoising network. Based on these insights, we suggest a simple yet strong diffusion planning baseline that achieves state-of-the-art results on standard offline RL benchmarks.

Paper Structure

This paper contains 37 sections, 4 equations, 18 figures, 12 tables, 1 algorithm.

Figures (18)

  • Figure 1: Diffusion planning framework for decision making. (a) The generation of a sequence plan using the denoising process of a diffusion model. A 3-joints robot arm is used as an illustrative example. (b) Important components and candidates in the framework. Each color corresponds to one component in the framework. A star indicates the preferred choice in experiments.
  • Figure 2: Rendering of the benchmarking tasks considered in this study, where $dim(\mathcal{S})$ and $dim(\mathcal{A})$ denote the dimension of the state and action spaces.
  • Figure 3: Comparison of performance between two action generation strategies. "Seperate" learns and uses inverse dynamics to compute action from state plan. "Joint" means learning joint distribution of state and action and directly executing the generated action at the current step (see "action generatation" in Fig. \ref{['fig:overview']}(b)). A straightforward conclusion drawn from the results is that "Separate" is better than "Joint" when tackling higher-dimensional action spaces. The vertical dashed line indicates on-par performance.
  • Figure 4: Performance change of DV over planning stride. It reduces to dense-step planning when Stride=1. The star indicates the choice of DV.
  • Figure 5: Using Transformer as the backbone of denoising network. (a) Performance comparison between Transformer and U-Net.The Transformer outperforms U-Net in 8 out of 9 sub-tasks and in all 3 main tasks. The amount of parameters in U-Net is comparable to that in Transformers. Note that the error bars in Kitchen are too small to visualize (See Table \ref{['table:backbone']} for numerical results). (b) Visualization of attention weights of the first layer in the Transformer network during the denoising process. More plots can be found in Appendix \ref{['appendix:extensive_results']}.
  • ...and 13 more figures