Iterative Tilting for Diffusion Fine-Tuning

Jean Pachebat; Giovanni Conforti; Alain Durmus; Yazid Janati

Iterative Tilting for Diffusion Fine-Tuning

Jean Pachebat, Giovanni Conforti, Alain Durmus, Yazid Janati

TL;DR

This work tackles reward-driven fine-tuning of diffusion models without gradient-based backpropagation through sampling. It introduces iterative tilting, decomposing a large reward tilt into N small, tractable tilts and updating the score via a first-order Taylor expansion using only forward evaluations of the reward. The approach yields a gradient-free, scalable path toward reward-tilted distributions and is validated on a 2D Gaussian mixture where exact tilted scores are available, showing convergence of the learned score across tilts. Compared to gradient-based methods like DRaFT and adjoint-based SOC, iterative tilting trades a single gradient update for multiple cheap gradient-free updates, enabling handling of non-differentiable or black-box rewards. The results suggest practical potential for high-dimensional, reward-driven diffusion tuning, with avenues for variance reduction, N-selection criteria, and integration with efficient adapters.

Abstract

We introduce iterative tilting, a gradient-free method for fine-tuning diffusion models toward reward-tilted distributions. The method decomposes a large reward tilt $\exp(λr)$ into $N$ sequential smaller tilts, each admitting a tractable score update via first-order Taylor expansion. This requires only forward evaluations of the reward function and avoids backpropagating through sampling chains. We validate on a two-dimensional Gaussian mixture with linear reward, where the exact tilted distribution is available in closed form.

Iterative Tilting for Diffusion Fine-Tuning

TL;DR

Abstract

Iterative Tilting for Diffusion Fine-Tuning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (4)