Table of Contents
Fetching ...

Bellman Diffusion Models

Liam Schramm, Abdeslam Boularias

TL;DR

This paper introduces Bellman Diffusion Models (BDM) as off-policy, generative estimators of the successor state measure, enforcing Bellman-flow constraints to yield a simple Bellman update on diffusion step distributions. It derives a TD-like update for diffusion models and provides KL-based bounds to connect diffusion predictions with Bellman consistency, enabling a practical, low-variance objective. The authors propose TD3-SBC, an offline RL algorithm that regularizes both actions and future states via a Bellman diffusion term, built on ReBRAC, and show state-of-the-art results on D4RL with improved stability. The approach bridges state-occupancy perspectives and practical RL by enabling direct regularization of the SSM, reducing distribution shift and broadening the applicability of diffusion-based policies in offline settings.

Abstract

Diffusion models have seen tremendous success as generative architectures. Recently, they have been shown to be effective at modelling policies for offline reinforcement learning and imitation learning. We explore using diffusion as a model class for the successor state measure (SSM) of a policy. We find that enforcing the Bellman flow constraints leads to a simple Bellman update on the diffusion step distribution.

Bellman Diffusion Models

TL;DR

This paper introduces Bellman Diffusion Models (BDM) as off-policy, generative estimators of the successor state measure, enforcing Bellman-flow constraints to yield a simple Bellman update on diffusion step distributions. It derives a TD-like update for diffusion models and provides KL-based bounds to connect diffusion predictions with Bellman consistency, enabling a practical, low-variance objective. The authors propose TD3-SBC, an offline RL algorithm that regularizes both actions and future states via a Bellman diffusion term, built on ReBRAC, and show state-of-the-art results on D4RL with improved stability. The approach bridges state-occupancy perspectives and practical RL by enabling direct regularization of the SSM, reducing distribution shift and broadening the applicability of diffusion-based policies in offline settings.

Abstract

Diffusion models have seen tremendous success as generative architectures. Recently, they have been shown to be effective at modelling policies for offline reinforcement learning and imitation learning. We explore using diffusion as a model class for the successor state measure (SSM) of a policy. We find that enforcing the Bellman flow constraints leads to a simple Bellman update on the diffusion step distribution.
Paper Structure (20 sections, 4 theorems, 41 equations, 4 tables, 3 algorithms)

This paper contains 20 sections, 4 theorems, 41 equations, 4 tables, 3 algorithms.

Key Result

Lemma 1

Let $q$ and $p$ be $K$-step diffusion models with noise schedule $\beta_i$, parameterized by neural networks with outputs $\epsilon_q$ and $\epsilon_p$, respectively. Let $q_i$ and $p_i$ be the distribution of the samples generated by the first $K-i$ steps of the forward process of $q$ and $p$, resp

Theorems & Definitions (7)

  • Lemma 1
  • Proposition 1
  • Proposition 2
  • Corollary 1
  • proof
  • proof
  • proof