Table of Contents
Fetching ...

Composition and Alignment of Diffusion Models using Constrained Learning

Shervin Khalafi, Ignacio Hounie, Dongsheng Ding, Alejandro Ribeiro

TL;DR

This work reframes diffusion-model alignment and composition as constrained learning problems, enforcing reward constraints and closeness to pretrained models via a Lagrangian dual framework. It provides theoretical characterizations of alignment and constrained composition, and develops primal-dual training procedures for score-based diffusion models. Empirically, it demonstrates that constrained alignment maintains fidelity to pretrained models while satisfying multiple rewards, and constrained product composition yields balanced performance across multiple pretrained adapters. The approach offers a principled, scalable alternative to ad hoc weight-tuning for complex, multi-objective diffusion-model customization with practical implications for safe and reliable generative systems.

Abstract

Diffusion models have become prevalent in generative modeling due to their ability to sample from complex distributions. To improve the quality of generated samples and their compliance with user requirements, two commonly used methods are: (i) Alignment, which involves finetuning a diffusion model to align it with a reward; and (ii) Composition, which combines several pretrained diffusion models together, each emphasizing a desirable attribute in the generated outputs. However, trade-offs often arise when optimizing for multiple rewards or combining multiple models, as they can often represent competing properties. Existing methods cannot guarantee that the resulting model faithfully generates samples with all the desired properties. To address this gap, we propose a constrained optimization framework that unifies alignment and composition of diffusion models by enforcing that the aligned model satisfies reward constraints and/or remains close to each pretrained model. We provide a theoretical characterization of the solutions to the constrained alignment and composition problems and develop a Lagrangian-based primal-dual training algorithm to approximate these solutions. Empirically, we demonstrate our proposed approach in image generation, applying it to alignment and composition, and show that our aligned or composed model satisfies constraints effectively. Our implementation can be found at: \href{https://github.com/shervinkhalafi/constrained_comp_align}{https://github.com/shervinkhalafi/constrained\_comp\_align}

Composition and Alignment of Diffusion Models using Constrained Learning

TL;DR

This work reframes diffusion-model alignment and composition as constrained learning problems, enforcing reward constraints and closeness to pretrained models via a Lagrangian dual framework. It provides theoretical characterizations of alignment and constrained composition, and develops primal-dual training procedures for score-based diffusion models. Empirically, it demonstrates that constrained alignment maintains fidelity to pretrained models while satisfying multiple rewards, and constrained product composition yields balanced performance across multiple pretrained adapters. The approach offers a principled, scalable alternative to ad hoc weight-tuning for complex, multi-objective diffusion-model customization with practical implications for safe and reliable generative systems.

Abstract

Diffusion models have become prevalent in generative modeling due to their ability to sample from complex distributions. To improve the quality of generated samples and their compliance with user requirements, two commonly used methods are: (i) Alignment, which involves finetuning a diffusion model to align it with a reward; and (ii) Composition, which combines several pretrained diffusion models together, each emphasizing a desirable attribute in the generated outputs. However, trade-offs often arise when optimizing for multiple rewards or combining multiple models, as they can often represent competing properties. Existing methods cannot guarantee that the resulting model faithfully generates samples with all the desired properties. To address this gap, we propose a constrained optimization framework that unifies alignment and composition of diffusion models by enforcing that the aligned model satisfies reward constraints and/or remains close to each pretrained model. We provide a theoretical characterization of the solutions to the constrained alignment and composition problems and develop a Lagrangian-based primal-dual training algorithm to approximate these solutions. Empirically, we demonstrate our proposed approach in image generation, applying it to alignment and composition, and show that our aligned or composed model satisfies constraints effectively. Our implementation can be found at: \href{https://github.com/shervinkhalafi/constrained_comp_align}{https://github.com/shervinkhalafi/constrained\_comp\_align}

Paper Structure

This paper contains 46 sections, 17 theorems, 115 equations, 6 figures, 15 tables, 3 algorithms.

Key Result

Lemma 1

If two backward processes $p_{0:T}(\cdot)$ and $q_{0:T}(\cdot)$ have the same variance schedule $\sigma_t$ and noise schedule $\alpha_t$, then the reverse KL divergence between them is given by

Figures (6)

  • Figure 1: Product composition (AND). Three Gaussian distributions being composed (Left). Composition using equal weights (Middle), and with constraints (Right). The constrained model samples from the intersection of the three models.
  • Figure 2: Mixture composition (OR). Two of Gaussian mixtures being composed (Left). One has two modes and the other has only a single mode. Composition using equal weights (Middle), and with constraints (Right).
  • Figure 3: Reward alignment. Stable diffusion is finetuned using one reward that emphasizes aesthetic quality (MPS), and Saturation and Local Contrast as regularizers. Reward values for the equal weights method and our constrained alignment (Left). Images are sampled from the aligned models (Right), and the model trained solely with MPS reward is used for comparison.
  • Figure 4: Reward alignment. Stable diffusion is finetuned using multiple image quality/aesthetic rewards. Reward trajectories for the regularization-based method and our constrained alignment during training (Left). KL divergences to the pretrained model (Middle). Images are sampled from the aligned models (Right), and the pretrained model is used for comparison.
  • Figure 5: Product composition. Stable diffusion with LoRA is finetuned using different rewards, for equal weighted and product mixtures. 100% represents the reward levels attained by models aligned solely with the individual reward. Higher is better.
  • ...and 1 more figures

Theorems & Definitions (31)

  • Lemma 1: Path-wise KL divergence
  • Lemma 2: Point-wise KL divergence
  • Theorem 1: Reward alignment
  • Theorem 2: Strong duality
  • Theorem 3: Product composition
  • Remark 1
  • Theorem 4: Strong duality
  • Lemma 3
  • Remark 2
  • proof
  • ...and 21 more