Table of Contents
Fetching ...

Observation-Guided Diffusion Probabilistic Models

Junoh Kang, Jinyoung Choi, Sungik Choi, Bohyung Han

TL;DR

A novel diffusion-based image generation method called the observation-guided diffusion probabilis-tic model (OGDM), which effectively addresses the trade-off between quality control and fast sampling by integrating the guidance of the observation process with the Markov chain in a principled way.

Abstract

We propose a novel diffusion-based image generation method called the observation-guided diffusion probabilistic model (OGDM), which effectively addresses the tradeoff between quality control and fast sampling. Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain in a principled way. This is achieved by introducing an additional loss term derived from the observation based on a conditional discriminator on noise level, which employs a Bernoulli distribution indicating whether its input lies on the (noisy) real manifold or not. This strategy allows us to optimize the more accurate negative log-likelihood induced in the inference stage especially when the number of function evaluations is limited. The proposed training scheme is also advantageous even when incorporated only into the fine-tuning process, and it is compatible with various fast inference strategies since our method yields better denoising networks using the exactly the same inference procedure without incurring extra computational cost. We demonstrate the effectiveness of our training algorithm using diverse inference techniques on strong diffusion model baselines. Our implementation is available at https://github.com/Junoh-Kang/OGDM_edm.

Observation-Guided Diffusion Probabilistic Models

TL;DR

A novel diffusion-based image generation method called the observation-guided diffusion probabilis-tic model (OGDM), which effectively addresses the trade-off between quality control and fast sampling by integrating the guidance of the observation process with the Markov chain in a principled way.

Abstract

We propose a novel diffusion-based image generation method called the observation-guided diffusion probabilistic model (OGDM), which effectively addresses the tradeoff between quality control and fast sampling. Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain in a principled way. This is achieved by introducing an additional loss term derived from the observation based on a conditional discriminator on noise level, which employs a Bernoulli distribution indicating whether its input lies on the (noisy) real manifold or not. This strategy allows us to optimize the more accurate negative log-likelihood induced in the inference stage especially when the number of function evaluations is limited. The proposed training scheme is also advantageous even when incorporated only into the fine-tuning process, and it is compatible with various fast inference strategies since our method yields better denoising networks using the exactly the same inference procedure without incurring extra computational cost. We demonstrate the effectiveness of our training algorithm using diverse inference techniques on strong diffusion model baselines. Our implementation is available at https://github.com/Junoh-Kang/OGDM_edm.
Paper Structure (44 sections, 2 theorems, 34 equations, 21 figures, 9 tables, 2 algorithms)

This paper contains 44 sections, 2 theorems, 34 equations, 21 figures, 9 tables, 2 algorithms.

Key Result

Lemma 1

For $\mathbf{u} \sim p_\mathbf{u}$ and $\mathbf{v}|\mathbf{u} \sim \mathcal{N}(\sqrt{1-\beta}\mathbf{u}, \beta \mathbf{I})$, we obtain the following two asymptotic distributions of $p_{\mathbf{u}|\mathbf{v}}^{(\beta)}(\mathbf{u}|\mathbf{v})$:

Figures (21)

  • Figure 1: Comparisons of images generated by the ADM backbone on the CelebA dataset with deterministic samplers using the same initial noise but different NFEs. The entries on the leftmost column of the figure denote the combinations of the training and inference methods. (Left) The baseline model generates samples with inconsistent attributes, e.g., gender, hair, etc., by varying NFEs while our approach preserves such properties. (Right) The samples generated by the baseline method with a small number of NFEs tend to be blurry and unrealistic. Also, they have unnaturally bright and textureless areas around the chin of the person.
  • Figure 2: The graphical model of the proposed denoising process with observations.
  • Figure 3: The role of the discriminator in our objective. $\theta_{\text{ours}}$ and $\theta_{\text{base}}$ denote the denoising parameters learned by the proposed method and the baseline, respectively. The proposed training method nudges the prediction of $\hat{\mathbf{x}}_{t-s}^\theta$ closer to the exact state space than the original.
  • Figure 4: Simulations when $\mu=2$. (a) $\ell^2$-norm between $p_{\mathbf{u}|\mathbf{v}}^{(\beta)}(\mathbf{u}|\mathbf{v})$, and $q_{\mathbf{u}|\mathbf{v}}^{(\xi)}(\mathbf{u}|\mathbf{v})$ with respect to $\xi$ when $\mathbf{v}=0.1$. (b) The graph of $\xi(\beta)$ with respect to $\beta$ for various $\mathbf{v}$. (c) Pdfs of four distributions when $\mathbf{v}=0.1$, $\beta=0.4$, and $\xi=\xi(\beta)=0.55$.
  • Figure 5: Qualitative results on CIFAR-10 dataset with the ADM backbone using Euler method (top) and S-PNDM (bottom) with NFEs=$10$.
  • ...and 16 more figures

Theorems & Definitions (5)

  • Lemma 1
  • proof
  • Lemma 1
  • proof : Proof of \ref{['eq:asymptotic1']}
  • proof : Proof of \ref{['eq:asymptotic2']}