Table of Contents
Fetching ...

RNE: plug-and-play diffusion inference-time control and energy-based training

Jiajun He, José Miguel Hernández-Lobato, Yuanqi Du, Francisco Vargas

TL;DR

The Radon-Nikodym Estimator (RNE) is introduced, which reveals a fundamental connection between marginal densities and transition kernels, providing a flexible plug-and-play framework that unifies diffusion density estimation, inference-time control, and energy-based diffusion training under a single perspective.

Abstract

Diffusion models generate data by removing noise gradually, which corresponds to the time-reversal of a noising process. However, access to only the denoising kernels is often insufficient. In many applications, we need the knowledge of the marginal densities along the generation trajectory, which enables tasks such as inference-time control. To address this gap, in this paper, we introduce the Radon-Nikodym Estimator (RNE). Based on the concept of the \textit{density ratio} between path distributions, it reveals a fundamental connection between marginal densities and transition kernels, providing a flexible plug-and-play framework that unifies (1) diffusion density estimation, (2) inference-time control, and (3) energy-based diffusion training under a single perspective. Experiments demonstrate that RNE delivers strong results in inference-time control applications, such as annealing and model composition, with promising inference-time scaling performance, and achieves a simple yet efficient regularisation for training energy-based diffusion models. Additionally, our proposed RNE is modality-agnostic and applicable not only to continuous diffusion models but also to their discrete diffusion counterparts.

RNE: plug-and-play diffusion inference-time control and energy-based training

TL;DR

The Radon-Nikodym Estimator (RNE) is introduced, which reveals a fundamental connection between marginal densities and transition kernels, providing a flexible plug-and-play framework that unifies diffusion density estimation, inference-time control, and energy-based diffusion training under a single perspective.

Abstract

Diffusion models generate data by removing noise gradually, which corresponds to the time-reversal of a noising process. However, access to only the denoising kernels is often insufficient. In many applications, we need the knowledge of the marginal densities along the generation trajectory, which enables tasks such as inference-time control. To address this gap, in this paper, we introduce the Radon-Nikodym Estimator (RNE). Based on the concept of the \textit{density ratio} between path distributions, it reveals a fundamental connection between marginal densities and transition kernels, providing a flexible plug-and-play framework that unifies (1) diffusion density estimation, (2) inference-time control, and (3) energy-based diffusion training under a single perspective. Experiments demonstrate that RNE delivers strong results in inference-time control applications, such as annealing and model composition, with promising inference-time scaling performance, and achieves a simple yet efficient regularisation for training energy-based diffusion models. Additionally, our proposed RNE is modality-agnostic and applicable not only to continuous diffusion models but also to their discrete diffusion counterparts.

Paper Structure

This paper contains 66 sections, 18 theorems, 198 equations, 20 figures, 5 tables.

Key Result

Proposition 2.2

Figures (20)

  • Figure 1: Conceptual illustration of our proposed approach. (a) RNE leverages the fact that RND between time-reversal processes is 1 to calculate marginal densities. (b) RNC applies RNE to calculate importance weights for inference time control.
  • Figure 1: Inference-time annealing on ALDP. *SMC will reduce sample diversity, which predominantly influences $W_{2}$. Therefore, $W_2$ for "anneal score" should not be directly compared against SMC methods. Instead, energy and distance TVD are less sensitive to sample diversity and are more comparable.
  • Figure 2: Energy TVD (left), sample $W_2$ (middle), and accumulated weight variance (right) by different pairs of $(c_a, c_b)$ for annealing on ALDP.
  • Figure 3: Inference-time scaling on ALDP.
  • Figure 4: Learned density on 2D GMM.
  • ...and 15 more figures

Theorems & Definitions (37)

  • Definition 2.1
  • Proposition 2.2: Exact SMC weight for reward-tilting with imperfect diffusion model
  • Proposition 3.1
  • Proposition 3.2
  • Proposition C.1
  • Corollary C.2
  • Proposition C.3: "$\mathrm{FKC} \subseteq \mathrm{RNC}$"
  • Definition D.1
  • Proposition D.2
  • proof
  • ...and 27 more