Bellman Diffusion Models
Liam Schramm, Abdeslam Boularias
TL;DR
This paper introduces Bellman Diffusion Models (BDM) as off-policy, generative estimators of the successor state measure, enforcing Bellman-flow constraints to yield a simple Bellman update on diffusion step distributions. It derives a TD-like update for diffusion models and provides KL-based bounds to connect diffusion predictions with Bellman consistency, enabling a practical, low-variance objective. The authors propose TD3-SBC, an offline RL algorithm that regularizes both actions and future states via a Bellman diffusion term, built on ReBRAC, and show state-of-the-art results on D4RL with improved stability. The approach bridges state-occupancy perspectives and practical RL by enabling direct regularization of the SSM, reducing distribution shift and broadening the applicability of diffusion-based policies in offline settings.
Abstract
Diffusion models have seen tremendous success as generative architectures. Recently, they have been shown to be effective at modelling policies for offline reinforcement learning and imitation learning. We explore using diffusion as a model class for the successor state measure (SSM) of a policy. We find that enforcing the Bellman flow constraints leads to a simple Bellman update on the diffusion step distribution.
