Diffusion Models for Reinforcement Learning: A Survey

Zhengbang Zhu; Hanye Zhao; Haoran He; Yichao Zhong; Shenyu Zhang; Haoquan Guo; Tingting Chen; Weinan Zhang

Diffusion Models for Reinforcement Learning: A Survey

Zhengbang Zhu, Hanye Zhao, Haoran He, Yichao Zhong, Shenyu Zhang, Haoquan Guo, Tingting Chen, Weinan Zhang

TL;DR

This survey examines how diffusion models address core RL challenges such as data efficiency, distribution shift, planning errors, and multitask generalization. It organizes diffusion-RL methods into planners, policies, and data synthesizers, and surveys foundational techniques (DDPM and score-based models) alongside guided and fast sampling. The paper catalogs applications across offline/online RL, imitation learning, trajectory generation, and data augmentation, and highlights future directions including generative environment simulation, safety integration, retrieval-augmented generation, and skill composition. Overall, it positions diffusion models as a unifying framework that enhances expressiveness, data utilization, and planning robustness in RL, while outlining practical avenues for research and development.

Abstract

Diffusion models surpass previous generative models in sample quality and training stability. Recent works have shown the advantages of diffusion models in improving reinforcement learning (RL) solutions. This survey aims to provide an overview of this emerging field and hopes to inspire new avenues of research. First, we examine several challenges encountered by RL algorithms. Then, we present a taxonomy of existing methods based on the roles of diffusion models in RL and explore how the preceding challenges are addressed. We further outline successful applications of diffusion models in various RL-related tasks. Finally, we conclude the survey and offer insights into future research directions. We are actively maintaining a GitHub repository for papers and other related resources in utilizing diffusion models in RL: https://github.com/apexrl/Diff4RLSurvey.

Diffusion Models for Reinforcement Learning: A Survey

TL;DR

Abstract

Paper Structure (33 sections, 16 equations, 2 figures, 1 table)

This paper contains 33 sections, 16 equations, 2 figures, 1 table.

Introduction
Challenges in Reinforcement Learning
Restricted Expressiveness in Offline Learning
Data Scarcity in Experience Replay
Compounding Error in Model-based Planning
Generalization in Multitask Learning
Foundations of Diffusion Models
Denoising Diffusion Probabilistic Model
Score-based Generative Models
Guided Sampling Methods
Classifier guidance.
Classifier-free guidance.
Fast Sampling Methods
Learning-free methods.
Learning-based methods.
...and 18 more sections

Figures (2)

Figure 1: An illustration of how diffusion models play a different role in the classic Agent-Environment-Buffer cycle compared to previous solutions. (1) When used as a planner, diffusion models optimize the whole trajectory at each denoising step, whereas the autoregressive models generate the next-step output only based on previously planned partial subsequences. (2) When used as a policy, diffusion models can model arbitrary action distributions, whereas Gaussian policies can only fit the possibly diversified dataset distribution with unimodal distributions. (3) When used as a data synthesizer, diffusion models augment the dataset with generated data sampled from the learned dataset distribution, whereas augmentation with random perturbations might generate samples that deviate from data samples.
Figure 2: Different roles of diffusion models in RL. (a) Diffusion models as the planner. The sampling target is a part of trajectories whose components may vary from specific tasks. (b) Diffusion models as the policy. The sampling target is the action conditioned on the state, usually guided by the Q-function via policy gradient-style guidance or directly subtracting it from the training objective. (c) Diffusion models as the data synthesizer. The sampling target is also the trajectory, and both real and synthetic data are used for downstream policy improvement. For better visualizations, we omit the arrows for $N$ denoising iterations in (c) and only show generated synthetic data from randomly sampled noise. Note that there are other roles that are less explored, and we introduce them in Section \ref{['sec:other-roles']}.

Diffusion Models for Reinforcement Learning: A Survey

TL;DR

Abstract

Diffusion Models for Reinforcement Learning: A Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (2)