Table of Contents
Fetching ...

Composition and Control with Distilled Energy Diffusion Models and Sequential Monte Carlo

James Thornton, Louis Bethune, Ruixiang Zhang, Arwen Bradley, Preetum Nakkiran, Shuangfei Zhai

TL;DR

This work tackles the instability and limited controllability of energy-parameterized diffusion models by distilling pretrained diffusion networks into energy-based models using a conservative projection loss that aligns the energy drift with a teacher score, effectively realizing a Helmholtz-like decomposition of the score. It then couples these distilled energy models with a Feynman-Kac Sequential Monte Carlo framework to enable temperature-controlled, compositional, and bounded generation by shaping sampling potentials $G_t$ within FKMs. Empirically, the approach yields improved generative metrics (FID) over prior energy-parameterized methods, faster convergence, and demonstrated capabilities for AND-style compositional generation and low-temperature sampling on several image datasets. The proposed framework hence provides a practical, modular path to energy-guided diffusion with explicit control primitives, broadening the utility of diffusion models for conditioning, composition, and constraint satisfaction in a principled probabilistic setting.

Abstract

Diffusion models may be formulated as a time-indexed sequence of energy-based models, where the score corresponds to the negative gradient of an energy function. As opposed to learning the score directly, an energy parameterization is attractive as the energy itself can be used to control generation via Monte Carlo samplers. Architectural constraints and training instability in energy parameterized models have so far yielded inferior performance compared to directly approximating the score or denoiser. We address these deficiencies by introducing a novel training regime for the energy function through distillation of pre-trained diffusion models, resembling a Helmholtz decomposition of the score vector field. We further showcase the synergies between energy and score by casting the diffusion sampling procedure as a Feynman Kac model where sampling is controlled using potentials from the learnt energy functions. The Feynman Kac model formalism enables composition and low temperature sampling through sequential Monte Carlo.

Composition and Control with Distilled Energy Diffusion Models and Sequential Monte Carlo

TL;DR

This work tackles the instability and limited controllability of energy-parameterized diffusion models by distilling pretrained diffusion networks into energy-based models using a conservative projection loss that aligns the energy drift with a teacher score, effectively realizing a Helmholtz-like decomposition of the score. It then couples these distilled energy models with a Feynman-Kac Sequential Monte Carlo framework to enable temperature-controlled, compositional, and bounded generation by shaping sampling potentials within FKMs. Empirically, the approach yields improved generative metrics (FID) over prior energy-parameterized methods, faster convergence, and demonstrated capabilities for AND-style compositional generation and low-temperature sampling on several image datasets. The proposed framework hence provides a practical, modular path to energy-guided diffusion with explicit control primitives, broadening the utility of diffusion models for conditioning, composition, and constraint satisfaction in a principled probabilistic setting.

Abstract

Diffusion models may be formulated as a time-indexed sequence of energy-based models, where the score corresponds to the negative gradient of an energy function. As opposed to learning the score directly, an energy parameterization is attractive as the energy itself can be used to control generation via Monte Carlo samplers. Architectural constraints and training instability in energy parameterized models have so far yielded inferior performance compared to directly approximating the score or denoiser. We address these deficiencies by introducing a novel training regime for the energy function through distillation of pre-trained diffusion models, resembling a Helmholtz decomposition of the score vector field. We further showcase the synergies between energy and score by casting the diffusion sampling procedure as a Feynman Kac model where sampling is controlled using potentials from the learnt energy functions. The Feynman Kac model formalism enables composition and low temperature sampling through sequential Monte Carlo.

Paper Structure

This paper contains 38 sections, 27 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Density plot of $p_E\propto\exp{-E_\theta(x_t,t)}$, where $E_\theta$ is trained via distillation of pre-trained diffusion networks (Ours) (top) vs via energy parameterized denoising score matching (E-DSM).
  • Figure 2: E-DSM loss (blue): $\mathbb{E}\|D_\theta(\mathbf{X}_t,t)- \mathbf{X}_0\|$ without gradient clipping vs distillation loss (orange): $\mathbb{E}\|D_\theta(\mathbf{X}_t,t)- D^{teach}_{\phi}(\mathbf{X}_{t},t)\|$ during training of a diffusion model for CIFAR10. Initial $100$ iterations cut.
  • Figure 3: SMC Sampling of Feynman Kac Diffusion Models for $G_i(x_{t_{i}},x_{t-1})=\exp\{-\gamma_{t_i} E_\theta(x_{t_i}\}\approx p(x_{t_i})^\gamma_{t_i}$. The fractal distribution, inspired by karras2024guiding, is obtained by fitting Gaussian mixtures to each branch and appending recursively. Ground truth samples shown in (faded) blue and generated samples shown in orange.
  • Figure 4: Asymmetry metric in log-scale, $\|\mathbf{J} - \mathbf{J}^T\|^2$ where Jacobian $\mathbf{J}=\mathbf{D}_x s_\theta(x_t,t)$ of score network $s_\theta$ trained via DSM on AFHQv2-$64$, $t\in[0,1]$.
  • Figure 5: Simple 2D Composition failure from du2023reduce. Top row: Learnt densities, $e^{-E^{(i)}_\theta}$ for each $p^{(i)}_{t}$ and $e^{-E^{(1)}_\theta-E^{(2)}_\theta}$. Bottom row: generated samples per $p^{(i)}_{t}$, as well as the SMC generation using \ref{['eq:comp_fkm']} and reverse diffusion of summed scores, ${q}_{\theta, \lambda}^{(1)+(2)}(x_{t_{0:N}})$.
  • ...and 5 more figures