Table of Contents
Fetching ...

CoDe: Blockwise Control for Denoising Diffusion Models

Anuj Singh, Sayak Mukherjee, Ahmad Beirami, Hadi Jamali-Rad

TL;DR

CoDe presents a gradient-free, blockwise inference-time guidance method for diffusion models that approximates the KL-regularized reward-tilted posterior by operating on blocks of $B$ denoising steps with $N$ parallel samples. It derives an optimal policy $\pi_\lambda^*(x_{t-1}|x_t) \propto p(x_{t-1}|x_t) \exp(\lambda V(x_{t-1};p))$ and approximates the value function with $V(x_t; p) \approx r(\hat{x}_0)$ using Tweedie’s formula, enabling practical sampling without differentiable rewards. Through two case studies—Gaussian Mixture Models and Stable Diffusion image generation with both non-differentiable (compression) and differentiable (style, face, stroke) rewards—CoDe demonstrates competitive reward alignment while maintaining closer fidelity/diversity to the base model and reduced compute relative to strong baselines. The paper also analyzes a noise-conditioned extension $\eta$ to trade off reward and divergence, offers ablations on block size $B$ and sample count $N$, and discusses computational complexity, showing that CoDe achieves favorable reward-vs-compute and reward-vs-divergence tradeoffs with practical efficiency. Overall, CoDe provides a robust, scalable, gradient-free alternative for downstream-task alignment in diffusion models, with clear guidance for parameter selection and potential extensions to adaptive control.

Abstract

Aligning diffusion models to downstream tasks often requires finetuning new models or gradient-based guidance at inference time to enable sampling from the reward-tilted posterior. In this work, we explore a simple inference-time gradient-free guidance approach, called controlled denoising (CoDe), that circumvents the need for differentiable guidance functions and model finetuning. CoDe is a blockwise sampling method applied during intermediate denoising steps, allowing for alignment with downstream rewards. Our experiments demonstrate that, despite its simplicity, CoDe offers a favorable trade-off between reward alignment, prompt instruction following, and inference cost, achieving a competitive performance against the state-of-the-art baselines. Our code is available at: https://github.com/anujinho/code.

CoDe: Blockwise Control for Denoising Diffusion Models

TL;DR

CoDe presents a gradient-free, blockwise inference-time guidance method for diffusion models that approximates the KL-regularized reward-tilted posterior by operating on blocks of denoising steps with parallel samples. It derives an optimal policy and approximates the value function with using Tweedie’s formula, enabling practical sampling without differentiable rewards. Through two case studies—Gaussian Mixture Models and Stable Diffusion image generation with both non-differentiable (compression) and differentiable (style, face, stroke) rewards—CoDe demonstrates competitive reward alignment while maintaining closer fidelity/diversity to the base model and reduced compute relative to strong baselines. The paper also analyzes a noise-conditioned extension to trade off reward and divergence, offers ablations on block size and sample count , and discusses computational complexity, showing that CoDe achieves favorable reward-vs-compute and reward-vs-divergence tradeoffs with practical efficiency. Overall, CoDe provides a robust, scalable, gradient-free alternative for downstream-task alignment in diffusion models, with clear guidance for parameter selection and potential extensions to adaptive control.

Abstract

Aligning diffusion models to downstream tasks often requires finetuning new models or gradient-based guidance at inference time to enable sampling from the reward-tilted posterior. In this work, we explore a simple inference-time gradient-free guidance approach, called controlled denoising (CoDe), that circumvents the need for differentiable guidance functions and model finetuning. CoDe is a blockwise sampling method applied during intermediate denoising steps, allowing for alignment with downstream rewards. Our experiments demonstrate that, despite its simplicity, CoDe offers a favorable trade-off between reward alignment, prompt instruction following, and inference cost, achieving a competitive performance against the state-of-the-art baselines. Our code is available at: https://github.com/anujinho/code.

Paper Structure

This paper contains 28 sections, 2 theorems, 25 equations, 26 figures, 10 tables, 2 algorithms.

Key Result

Theorem 2.1

The optimal model $\pi_\lambda^*$ for the objective formulated in eq:klobj is given by:

Figures (26)

  • Figure 1: CoDe generates high quality compression (non-differentiable reward), style, face and stroke (differentiable rewards) guided images.
  • Figure 2: Setup (left, middle) and reward vs. divergence trade-off (right) for Case Study I. CoDe offers highest reward at lowest divergence with much lower $N$ than BoN.
  • Figure 3: In contrast to BoN, SVDD-PM, CoDe with and without noise-conditioning ($\eta=0.6$, $\eta=1$, resp.) are robust against increased distance between reward and prior distributions. SVDD-PM's generated samples offer almost zero variance indicating reward over-optimization.
  • Figure 4: CoDe($\eta$) demonstrates a superior trade-off between compressibility, image and text alignment as compared to other baselines on the (T+I)2I settings.
  • Figure 5: CoDe($\eta$) offers a better reward vs. KL-divergence trade-off as compared to BoN($\eta$) for all N values. SVDD-PM($\eta$) demonstrates a higher reward beyond $N=7$, but at the cost of a much higher KL-divergence.
  • ...and 21 more figures

Theorems & Definitions (4)

  • Theorem 2.1
  • proof : Proof of Theorem \ref{['thm:optpol']}
  • Lemma F.1
  • proof