Table of Contents
Fetching ...

Training-Free Adaptation of Diffusion Models via Doob's $h$-Transform

Qijie Zhu, Zeqi Ye, Han Liu, Zhaoran Wang, Minshuo Chen

TL;DR

DOIT (Doob-Oriented Inference-time Transformation) is proposed, a training-free and computationally efficient adaptation method that applies to generic, non-differentiable rewards and establishes a high probability convergence guarantee to the target high-reward distribution.

Abstract

Adaptation methods have been a workhorse for unlocking the transformative power of pre-trained diffusion models in diverse applications. Existing approaches often abstract adaptation objectives as a reward function and steer diffusion models to generate high-reward samples. However, these approaches can incur high computational overhead due to additional training, or rely on stringent assumptions on the reward such as differentiability. Moreover, despite their empirical success, theoretical justification and guarantees are seldom established. In this paper, we propose DOIT (Doob-Oriented Inference-time Transformation), a training-free and computationally efficient adaptation method that applies to generic, non-differentiable rewards. The key framework underlying our method is a measure transport formulation that seeks to transport the pre-trained generative distribution to a high-reward target distribution. We leverage Doob's $h$-transform to realize this transport, which induces a dynamic correction to the diffusion sampling process and enables efficient simulation-based computation without modifying the pre-trained model. Theoretically, we establish a high probability convergence guarantee to the target high-reward distribution via characterizing the approximation error in the dynamic Doob's correction. Empirically, on D4RL offline RL benchmarks, our method consistently outperforms state-of-the-art baselines while preserving sampling efficiency.

Training-Free Adaptation of Diffusion Models via Doob's $h$-Transform

TL;DR

DOIT (Doob-Oriented Inference-time Transformation) is proposed, a training-free and computationally efficient adaptation method that applies to generic, non-differentiable rewards and establishes a high probability convergence guarantee to the target high-reward distribution.

Abstract

Adaptation methods have been a workhorse for unlocking the transformative power of pre-trained diffusion models in diverse applications. Existing approaches often abstract adaptation objectives as a reward function and steer diffusion models to generate high-reward samples. However, these approaches can incur high computational overhead due to additional training, or rely on stringent assumptions on the reward such as differentiability. Moreover, despite their empirical success, theoretical justification and guarantees are seldom established. In this paper, we propose DOIT (Doob-Oriented Inference-time Transformation), a training-free and computationally efficient adaptation method that applies to generic, non-differentiable rewards. The key framework underlying our method is a measure transport formulation that seeks to transport the pre-trained generative distribution to a high-reward target distribution. We leverage Doob's -transform to realize this transport, which induces a dynamic correction to the diffusion sampling process and enables efficient simulation-based computation without modifying the pre-trained model. Theoretically, we establish a high probability convergence guarantee to the target high-reward distribution via characterizing the approximation error in the dynamic Doob's correction. Empirically, on D4RL offline RL benchmarks, our method consistently outperforms state-of-the-art baselines while preserving sampling efficiency.
Paper Structure (48 sections, 7 theorems, 126 equations, 3 figures, 9 tables, 3 algorithms)

This paper contains 48 sections, 7 theorems, 126 equations, 3 figures, 9 tables, 3 algorithms.

Key Result

Lemma 3.2

Let $q$ be the density function of the target distribution such that Let $U\sim\mathrm{Unif}(0,1)$ be independent of $\bar{X}_0$. Setting $h(x_t, t) = \mathbb{P}(\mathcal{E}_{\bar{X}_0} | \bar{X}_t = x_t)$ with leads to $p^h_{\theta,0}(x)\ =q(x).$

Figures (3)

  • Figure 1: DOIT: At each $t_l$, we simulate $M$ trajectories (here, $M=3$) starting from $x_{t_l}$ to approximate $\nabla\log h(x_{t_l},t_l)$ via \ref{['eq:mc_estimator']}, then utilize it to modify the sampling dynamics.
  • Figure 2: Violin plots of aesthetic scores for the samples generated by Stable Diffusion v1.5, comparing the vanilla generation result and applying DOIT across different $(\tau,\gamma)$ settings. The blue bars indicate the minimum and maximum scores, the orange bars represent the first & third quantiles, and the red marker denotes the mean.
  • Figure 3: Comparison of dog images generated by Stable Diffusion v1.5. The upper row displays samples from vanilla generation process, while the bottom row shows samples guided by DOIT using the aesthetic score as the reward.

Theorems & Definitions (8)

  • Definition 3.1: Doob's $h$-function
  • Lemma 3.2
  • Lemma 4.1
  • Lemma 5.2: Approximation bound of $\nabla \log h$
  • Theorem 5.4
  • Lemma B.1
  • Lemma B.2: High-probability bound
  • Lemma B.3: McDiarmid concentration for $\mathcal{H}_2$