Iterative Importance Fine-tuning of Diffusion Models

Alexander Denker; Shreyas Padhy; Francisco Vargas; Johannes Hertrich

Iterative Importance Fine-tuning of Diffusion Models

Alexander Denker, Shreyas Padhy, Francisco Vargas, Johannes Hertrich

TL;DR

This work treats downstream tasks for diffusion models as sampling from a tilted distribution $p_\text{tilted}(\mathbf{x}) \propto p_\text{data}(\mathbf{x}) \exp(r(\mathbf{x})/\lambda)$ and leverages Doob's $h$-transform to enable conditional sampling. It proposes a self-supervised, amortised fine-tuning framework (SIFT) that iteratively samples trajectories with a current control, uses path-based importance weights to filter to approximate the tilted distribution, and updates the control via a score-matching objective, with proven descent of the stochastic control free-energy. The method is validated on MNIST class-conditional sampling, inverse-problem posterior sampling (e.g., super-resolution), and text-to-image reward fine-tuning, showing competitive performance with efficient memory use and applicability to large models without backpropagating through the generation process. The approach provides a principled, scalable alternative to online RL-based fine-tuning, balancing fidelity to rewards with diversity and computational tractability, and offering practical benefits for personalized or task-specific diffusion-model deployment.

Abstract

Diffusion models are an important tool for generative modelling, serving as effective priors in applications such as imaging and protein design. A key challenge in applying diffusion models for downstream tasks is efficiently sampling from resulting posterior distributions, which can be addressed using Doob's $h$-transform. This work introduces a self-supervised algorithm for fine-tuning diffusion models by learning the optimal control, enabling amortised conditional sampling. Our method iteratively refines the control using a synthetic dataset resampled with path-based importance weights. We demonstrate the effectiveness of this framework on class-conditional sampling, inverse problems and reward fine-tuning for text-to-image diffusion models.

Iterative Importance Fine-tuning of Diffusion Models

TL;DR

This work treats downstream tasks for diffusion models as sampling from a tilted distribution

and leverages Doob's

-transform to enable conditional sampling. It proposes a self-supervised, amortised fine-tuning framework (SIFT) that iteratively samples trajectories with a current control, uses path-based importance weights to filter to approximate the tilted distribution, and updates the control via a score-matching objective, with proven descent of the stochastic control free-energy. The method is validated on MNIST class-conditional sampling, inverse-problem posterior sampling (e.g., super-resolution), and text-to-image reward fine-tuning, showing competitive performance with efficient memory use and applicability to large models without backpropagating through the generation process. The approach provides a principled, scalable alternative to online RL-based fine-tuning, balancing fidelity to rewards with diversity and computational tractability, and offering practical benefits for personalized or task-specific diffusion-model deployment.

Abstract

-transform. This work introduces a self-supervised algorithm for fine-tuning diffusion models by learning the optimal control, enabling amortised conditional sampling. Our method iteratively refines the control using a synthetic dataset resampled with path-based importance weights. We demonstrate the effectiveness of this framework on class-conditional sampling, inverse problems and reward fine-tuning for text-to-image diffusion models.

Iterative Importance Fine-tuning of Diffusion Models

TL;DR

Abstract

Iterative Importance Fine-tuning of Diffusion Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (15)