Implicit Diffusion: Efficient Optimization through Stochastic Sampling

Pierre Marion; Anna Korba; Peter Bartlett; Mathieu Blondel; Valentin De Bortoli; Arnaud Doucet; Felipe Llinares-López; Courtney Paquette; Quentin Berthet

Implicit Diffusion: Efficient Optimization through Stochastic Sampling

Pierre Marion, Anna Korba, Peter Bartlett, Mathieu Blondel, Valentin De Bortoli, Arnaud Doucet, Felipe Llinares-López, Courtney Paquette, Quentin Berthet

TL;DR

The paper addresses optimizing distributions defined implicitly by parameterized stochastic diffusions, turning a distributional problem into a finite-dimensional outer optimization over theta. It introduces Implicit Diffusion, a single-loop framework that jointly updates diffusion parameters and samples, with gradient estimation via implicit differentiation, analytic expressions, and adjoint methods. The authors provide theoretical guarantees for both continuous and discrete Langevin dynamics and a Gaussian denoising case, and demonstrate practical benefits in reward-driven tuning of Langevin and denoising diffusion models. The approach enables efficient finetuning of energy-based models and diffusion models without nested inner loops, with open-source implementation to support reproducibility and broader adoption.

Abstract

We present a new algorithm to optimize distributions defined implicitly by parameterized stochastic diffusions. Doing so allows us to modify the outcome distribution of sampling processes by optimizing over their parameters. We introduce a general framework for first-order optimization of these processes, that performs jointly, in a single loop, optimization and sampling steps. This approach is inspired by recent advances in bilevel optimization and automatic implicit differentiation, leveraging the point of view of sampling as optimization over the space of probability distributions. We provide theoretical guarantees on the performance of our method, as well as experimental results demonstrating its effectiveness. We apply it to training energy-based models and finetuning denoising diffusions.

Implicit Diffusion: Efficient Optimization through Stochastic Sampling

TL;DR

Abstract

Paper Structure (67 sections, 6 theorems, 132 equations, 18 figures, 4 tables, 7 algorithms)

This paper contains 67 sections, 6 theorems, 132 equations, 18 figures, 4 tables, 7 algorithms.

INTRODUCTION
Main Contributions.
Notation.
PROBLEM PRESENTATION
Sampling and optimization perspectives
Optimization objective.
Examples
Langevin dynamics.
Denoising diffusion.
METHODS
Overview
Estimation of gradients through sampling.
Beyond nested-loop approaches.
Gradient estimation through sampling
Direct analytical derivation.
...and 52 more sections

Key Result

Proposition 4.4

Consider a bounded function $R: \mathbb R^d \to \mathbb R$. Then, under Assumption ass:gradient_bounded, functions $\Gamma_{\text{rew}}$ and $\Gamma_{\text{ref}}$ defined by eq:gamma-1--eq:gamma-2 satisfy Assumption ass:Gamma_Lipschitz.

Figures (18)

Figure 1: Optimizing through sampling with Implicit Diffusion to finetune denoising diffusion models. Reward is brightness for MNIST and red for CIFAR-10.
Figure 2: A step of optimization through sampling. For a given parameter $\theta_0$, the sampling process is defined by applying $\Sigma_s$ for $s \in [T]$, producing $\pi^\star(\theta_0)$. The goal of optimization through sampling is to update $\theta$ to minimize $\ell = \mathcal{F} \circ \pi^\star$. Here the objective $\mathcal{F}$ corresponds to having lighter images (on average), which produces thicker digits.
Figure 3: Main approaches for reward tuning of denoising diffusions. References are given in Appendix \ref{['apx:additional-related-work']}.
Figure 4: Illustration of the Implicit Diffusion algorithm, in the finite time setting. Left: Sampling - one step of the parameterized sampling scheme is applied in parallel to all distributions in the queue. Right: Optimization - the last element of the queue is used to compute a gradient for the parameter.
Figure 5: Contour lines and samples for $({\color{gray} \CIRCLE})$: Langevin $\theta_0$ - $({\color{mydarkorchid} \CIRCLE})$ Unrolling with $T=100$ inner sampling steps - $({\color{myblue} \CIRCLE})$ Implicit Diffusion.
...and 13 more figures

Theorems & Definitions (14)

Definition 2.1: Iterative sampling operators
Example 2.2
Remark 2.3
Definition 3.1: Implicit gradient estimation
Remark 3.2
Proposition 4.4
Theorem 4.5
Theorem 4.7
Proposition 4.8
Definition B.1
...and 4 more

Implicit Diffusion: Efficient Optimization through Stochastic Sampling

TL;DR

Abstract

Implicit Diffusion: Efficient Optimization through Stochastic Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (14)