Table of Contents
Fetching ...

Dreamguider: Improved Training free Diffusion-based Conditional Generation

Nithin Gopalakrishnan Nair, Vishal M Patel

TL;DR

Dreamguider tackles the problem of inference-time conditional generation with diffusion models without backpropagating through the diffusion network or tuning task-specific guidance scales. It introduces three components: time-variant gradient guidance, a gradient-dependent scaling factor for automatic step-size control, and differentiable augmentation (DiffuseAugment) to stabilize guidance across timesteps. The method leverages a perturbed Markovian kernel framework and zeroth-order, MMSE-based guidance to handle both linear and non-linear inverse problems, achieving superior qualitative and quantitative results with fewer guidance steps. This approach reduces computational burden while delivering high-fidelity, photorealistic samples across diverse tasks, with plans to release code for reproducibility.

Abstract

Diffusion models have emerged as a formidable tool for training-free conditional generation.However, a key hurdle in inference-time guidance techniques is the need for compute-heavy backpropagation through the diffusion network for estimating the guidance direction. Moreover, these techniques often require handcrafted parameter tuning on a case-by-case basis. Although some recent works have introduced minimal compute methods for linear inverse problems, a generic lightweight guidance solution to both linear and non-linear guidance problems is still missing. To this end, we propose Dreamguider, a method that enables inference-time guidance without compute-heavy backpropagation through the diffusion network. The key idea is to regulate the gradient flow through a time-varying factor. Moreover, we propose an empirical guidance scale that works for a wide variety of tasks, hence removing the need for handcrafted parameter tuning. We further introduce an effective lightweight augmentation strategy that significantly boosts the performance during inference-time guidance. We present experiments using Dreamguider on multiple tasks across multiple datasets and models to show the effectiveness of the proposed modules. To facilitate further research, we will make the code public after the review process.

Dreamguider: Improved Training free Diffusion-based Conditional Generation

TL;DR

Dreamguider tackles the problem of inference-time conditional generation with diffusion models without backpropagating through the diffusion network or tuning task-specific guidance scales. It introduces three components: time-variant gradient guidance, a gradient-dependent scaling factor for automatic step-size control, and differentiable augmentation (DiffuseAugment) to stabilize guidance across timesteps. The method leverages a perturbed Markovian kernel framework and zeroth-order, MMSE-based guidance to handle both linear and non-linear inverse problems, achieving superior qualitative and quantitative results with fewer guidance steps. This approach reduces computational burden while delivering high-fidelity, photorealistic samples across diverse tasks, with plans to release code for reproducibility.

Abstract

Diffusion models have emerged as a formidable tool for training-free conditional generation.However, a key hurdle in inference-time guidance techniques is the need for compute-heavy backpropagation through the diffusion network for estimating the guidance direction. Moreover, these techniques often require handcrafted parameter tuning on a case-by-case basis. Although some recent works have introduced minimal compute methods for linear inverse problems, a generic lightweight guidance solution to both linear and non-linear guidance problems is still missing. To this end, we propose Dreamguider, a method that enables inference-time guidance without compute-heavy backpropagation through the diffusion network. The key idea is to regulate the gradient flow through a time-varying factor. Moreover, we propose an empirical guidance scale that works for a wide variety of tasks, hence removing the need for handcrafted parameter tuning. We further introduce an effective lightweight augmentation strategy that significantly boosts the performance during inference-time guidance. We present experiments using Dreamguider on multiple tasks across multiple datasets and models to show the effectiveness of the proposed modules. To facilitate further research, we will make the code public after the review process.
Paper Structure (24 sections, 20 equations, 17 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 20 equations, 17 figures, 5 tables, 1 algorithm.

Figures (17)

  • Figure 1: An illustration of the different applications of our method. We utilize a pretrained diffusion model to generate images satisfying a predefined condition without backpropagation through the diffusion UNet or any hand-crafted parameter tuning. We present results on (1) Real-world colorization, (2) Real-world super-resolution, (3) Style-guided Text-to-Image Generation, (4) Inpainting, (5) Sketch-to-Face, (6) Face ID Guidance, and (7) Face Semantics-to-Face synthesis.
  • Figure 2: An illustration of the difference between the existing method and our method. Existing works backpropagate through the diffusion network to perform guidance at each timestep, whereas we find the gradients with respect to the MMSE estimate and the predicted noise based on the timesteps, thereby bypassing the expensive backpropagation operation.
  • Figure 3: Qualitative comparisons for Linear Tasks on ImageNet for 100 inference steps
  • Figure 4: Qualitative comparisons for Linear Tasks on CelebA dataset for 100 inference steps
  • Figure 5: Qualitative comparisons for Non-linear Tasks on CelebA dataset for 100 inference steps
  • ...and 12 more figures