Table of Contents
Fetching ...

The Superposition of Diffusion Models Using the Itô Density Estimator

Marta Skreta, Lazar Atanackovic, Avishek Joey Bose, Alexander Tong, Kirill Neklyudov

TL;DR

This work addresses the challenge of leveraging multiple pre-trained diffusion models without retraining a larger composite model. It introduces SuperDiff, founded on diffusion-trajectory continuity, and an Itô density estimator to compute log-densities along reverse-time dynamics, enabling density-based reweighting for model combination. The authors derive two instantiations—OR (mixture of densities) and AND (equal-density constraints)—with explicit vector-field formulations and an efficient density-estimation procedure that avoids costly divergences. Empirically, SuperDiff improves diversity and fidelity across tasks including CIFAR-10 image generation with disjoint datasets, concept interpolation in Stable Diffusion, and unconditional protein and multi-property small-molecule design. Overall, SuperDiff enables scalable, inference-time composition of heterogeneous diffusion models, reducing the need to train ever larger unified models while expanding capability and controllability of generative systems.

Abstract

The Cambrian explosion of easily accessible pre-trained diffusion models suggests a demand for methods that combine multiple different pre-trained diffusion models without incurring the significant computational burden of re-training a larger combined model. In this paper, we cast the problem of combining multiple pre-trained diffusion models at the generation stage under a novel proposed framework termed superposition. Theoretically, we derive superposition from rigorous first principles stemming from the celebrated continuity equation and design two novel algorithms tailor-made for combining diffusion models in SuperDiff. SuperDiff leverages a new scalable Itô density estimator for the log likelihood of the diffusion SDE which incurs no additional overhead compared to the well-known Hutchinson's estimator needed for divergence calculations. We demonstrate that SuperDiff is scalable to large pre-trained diffusion models as superposition is performed solely through composition during inference, and also enjoys painless implementation as it combines different pre-trained vector fields through an automated re-weighting scheme. Notably, we show that SuperDiff is efficient during inference time, and mimics traditional composition operators such as the logical OR and the logical AND. We empirically demonstrate the utility of using SuperDiff for generating more diverse images on CIFAR-10, more faithful prompt conditioned image editing using Stable Diffusion, as well as improved conditional molecule generation and unconditional de novo structure design of proteins. https://github.com/necludov/super-diffusion

The Superposition of Diffusion Models Using the Itô Density Estimator

TL;DR

This work addresses the challenge of leveraging multiple pre-trained diffusion models without retraining a larger composite model. It introduces SuperDiff, founded on diffusion-trajectory continuity, and an Itô density estimator to compute log-densities along reverse-time dynamics, enabling density-based reweighting for model combination. The authors derive two instantiations—OR (mixture of densities) and AND (equal-density constraints)—with explicit vector-field formulations and an efficient density-estimation procedure that avoids costly divergences. Empirically, SuperDiff improves diversity and fidelity across tasks including CIFAR-10 image generation with disjoint datasets, concept interpolation in Stable Diffusion, and unconditional protein and multi-property small-molecule design. Overall, SuperDiff enables scalable, inference-time composition of heterogeneous diffusion models, reducing the need to train ever larger unified models while expanding capability and controllability of generative systems.

Abstract

The Cambrian explosion of easily accessible pre-trained diffusion models suggests a demand for methods that combine multiple different pre-trained diffusion models without incurring the significant computational burden of re-training a larger combined model. In this paper, we cast the problem of combining multiple pre-trained diffusion models at the generation stage under a novel proposed framework termed superposition. Theoretically, we derive superposition from rigorous first principles stemming from the celebrated continuity equation and design two novel algorithms tailor-made for combining diffusion models in SuperDiff. SuperDiff leverages a new scalable Itô density estimator for the log likelihood of the diffusion SDE which incurs no additional overhead compared to the well-known Hutchinson's estimator needed for divergence calculations. We demonstrate that SuperDiff is scalable to large pre-trained diffusion models as superposition is performed solely through composition during inference, and also enjoys painless implementation as it combines different pre-trained vector fields through an automated re-weighting scheme. Notably, we show that SuperDiff is efficient during inference time, and mimics traditional composition operators such as the logical OR and the logical AND. We empirically demonstrate the utility of using SuperDiff for generating more diverse images on CIFAR-10, more faithful prompt conditioned image editing using Stable Diffusion, as well as improved conditional molecule generation and unconditional de novo structure design of proteins. https://github.com/necludov/super-diffusion

Paper Structure

This paper contains 40 sections, 26 theorems, 113 equations, 28 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

[Reverse-time SDEs/ODE] Marginal densities $q_{t}(x)$ induced by eq:diffusion_sde_Ito correspond to the densities induced by the following SDE that goes back in time ($\tau = 1-t$) with the corresponding initial condition where ${ \macc@depth1 \frozen@everymath{\mathgroup\macc@group} \macc@set@skewchar \macc@nested@a111{} } _\tau$ is the standard Wiener process in time $\tau$, and $\xi_\tau$ is

Figures (28)

  • Figure 1: Concept interpolations via different methods: SuperDiff (top row), the averaging of outputs with different prompts (middle row), and joint prompting with standard Stable Diffusion (SD) (bottom row) for six different prompt combinations. Here we use SuperDiff with the AND operation (sampling equal densities).
  • Figure 2: An intuitive illustration of using model superposition for improving inference performance. We show an example of two disjoint datasets and train a model for each set. Each individual model learns to generate samples only from their respective datasets. Using model superposition enables sampling from both densities.
  • Figure 3: UMAP visualization of protein structures showing cluster archetypes where structure diversity is maintained with SuperDiff$_{\ell = 0}$ (OR).
  • Figure A1: UMAP visualizations of protein structures with (a) SuperDiff (AND) and (b) averaging of outputs.
  • Figure A2: Proteins generated by SuperDiff (AND) with scTM score $< 0.3$.
  • ...and 23 more figures

Theorems & Definitions (39)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Theorem 1
  • Proposition 6
  • Proposition 6
  • Proposition 6
  • proof
  • ...and 29 more