CoDe: Blockwise Control for Denoising Diffusion Models
Anuj Singh, Sayak Mukherjee, Ahmad Beirami, Hadi Jamali-Rad
TL;DR
CoDe presents a gradient-free, blockwise inference-time guidance method for diffusion models that approximates the KL-regularized reward-tilted posterior by operating on blocks of $B$ denoising steps with $N$ parallel samples. It derives an optimal policy $\pi_\lambda^*(x_{t-1}|x_t) \propto p(x_{t-1}|x_t) \exp(\lambda V(x_{t-1};p))$ and approximates the value function with $V(x_t; p) \approx r(\hat{x}_0)$ using Tweedie’s formula, enabling practical sampling without differentiable rewards. Through two case studies—Gaussian Mixture Models and Stable Diffusion image generation with both non-differentiable (compression) and differentiable (style, face, stroke) rewards—CoDe demonstrates competitive reward alignment while maintaining closer fidelity/diversity to the base model and reduced compute relative to strong baselines. The paper also analyzes a noise-conditioned extension $\eta$ to trade off reward and divergence, offers ablations on block size $B$ and sample count $N$, and discusses computational complexity, showing that CoDe achieves favorable reward-vs-compute and reward-vs-divergence tradeoffs with practical efficiency. Overall, CoDe provides a robust, scalable, gradient-free alternative for downstream-task alignment in diffusion models, with clear guidance for parameter selection and potential extensions to adaptive control.
Abstract
Aligning diffusion models to downstream tasks often requires finetuning new models or gradient-based guidance at inference time to enable sampling from the reward-tilted posterior. In this work, we explore a simple inference-time gradient-free guidance approach, called controlled denoising (CoDe), that circumvents the need for differentiable guidance functions and model finetuning. CoDe is a blockwise sampling method applied during intermediate denoising steps, allowing for alignment with downstream rewards. Our experiments demonstrate that, despite its simplicity, CoDe offers a favorable trade-off between reward alignment, prompt instruction following, and inference cost, achieving a competitive performance against the state-of-the-art baselines. Our code is available at: https://github.com/anujinho/code.
