Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space

Yangming Li; Chieh-Hsin Lai; Carola-Bibiane Schönlieb; Yuki Mitsufuji; Stefano Ermon

Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space

Yangming Li, Chieh-Hsin Lai, Carola-Bibiane Schönlieb, Yuki Mitsufuji, Stefano Ermon

TL;DR

Bellman Diffusion is introduced, a novel DGM framework that maintains linearity in MDPs through gradient and scalar field modeling and is a capable image generator, converging 1.5x faster than the traditional histogram-based baseline in distributional RL tasks.

Abstract

Deep Generative Models (DGMs), including Energy-Based Models (EBMs) and Score-based Generative Models (SGMs), have advanced high-fidelity data generation and complex continuous distribution approximation. However, their application in Markov Decision Processes (MDPs), particularly in distributional Reinforcement Learning (RL), remains underexplored, with conventional histogram-based methods dominating the field. This paper rigorously highlights that this application gap is caused by the nonlinearity of modern DGMs, which conflicts with the linearity required by the Bellman equation in MDPs. For instance, EBMs involve nonlinear operations such as exponentiating energy functions and normalizing constants. To address this, we introduce Bellman Diffusion, a novel DGM framework that maintains linearity in MDPs through gradient and scalar field modeling. With divergence-based training techniques to optimize neural network proxies and a new type of stochastic differential equation (SDE) for sampling, Bellman Diffusion is guaranteed to converge to the target distribution. Our empirical results show that Bellman Diffusion achieves accurate field estimations and is a capable image generator, converging 1.5x faster than the traditional histogram-based baseline in distributional RL tasks. This work enables the effective integration of DGMs into MDP applications, unlocking new avenues for advanced decision-making frameworks.

Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space

TL;DR

Abstract

Paper Structure (56 sections, 5 theorems, 78 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 56 sections, 5 theorems, 78 equations, 8 figures, 2 tables, 2 algorithms.

Introduction
Motivation and problem.
Our framework: Bellman Diffusion.
Theoretical and empirical results.
Linear Property for MDPs
Modelings of Modern Deep Generative Models
Desired Linear Property in MDP
Method: Bellman Diffusion
Scalar and Vector Field Matching
Field matching.
Efficient Field Matching Losses
Slice trick for efficient training.
Slice trick for improving sample efficiency.
Bellman Diffusion Dynamics
Summary of Training and Sampling Algorithms
...and 41 more sections

Key Result

Proposition 3.1

The loss $\mathcal{L}_{\mathrm{grad}}(\bm{\phi})$ is given by and $\mathcal{L}_{\mathrm{id}}(\bm{\varphi})$ is expressed as Here, $\mathcal{N}({\mathbf{x}}; \mathbf{0}, \epsilon \mathbf{I}_D)$ denotes a $D$-dimensional isotropic Gaussian density function with ${\mathbf{x}}$, and $C_{\mathrm{grad}}$ and $C_{\mathrm{id}}$ are constants independent of the model parameters $\bm{\phi}$ and $\bm{\varp

Figures (8)

Figure 1: Bellman Diffusion captures the uniform distribution supported on disjoint spans. The leftmost subfigure presents the training data histogram, while the next three show the estimated density, derivative functions, and samples generated by Bellman Diffusion.
Figure 2: Bellman Diffusion learns the unbalanced Gaussian mixture, which is hard for score-based models. The subfigures, from left to right, display the training data, estimated scalar and gradient fields, and samples generated by our Bellman Diffusion.
Figure 3: Bellman Diffusion learns unusually clustered data. The subfigures, from left to right, show the training data, estimated density field, gradient field, and generated samples.
Figure 4: The $2 \times 3$ subfigures, arranged from left to right and top to bottom, show a full trajectory of Bellman Diffusion, interacting with a maze environment. Each subfigure consists of the state on the left, gradient field in the middle, and scalar field on the right.
Figure 5: The left and right subfigures respectively show the initial and terminal states of Bellman Diffusion, interacting with an environment of balance control. Every subfigure is composed of the observation on the left, gradient field in the middle, and scalar field on the right.
...and 3 more figures

Theorems & Definitions (10)

Definition 3.1: Field Divergences
Proposition 3.1: Equivalent Forms of Field Matching
Proposition 3.2: Sliced Gradient Matching
Theorem 4.1: Convergence to the Steady State
Theorem 4.2: Error Analysis of Neural Network Approximations
Theorem B.1: Well-defined Divergences
proof
proof
proof
proof

Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space

TL;DR

Abstract

Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (10)