Table of Contents
Fetching ...

Integrating Amortized Inference with Diffusion Models for Learning Clean Distribution from Corrupted Images

Yifei Wang, Weimin Bai, Weijian Luo, Wenzheng Chen, He Sun

TL;DR

<3-5 sentence high-level summary>

Abstract

Diffusion models (DMs) have emerged as powerful generative models for solving inverse problems, offering a good approximation of prior distributions of real-world image data. Typically, diffusion models rely on large-scale clean signals to accurately learn the score functions of ground truth clean image distributions. However, such a requirement for large amounts of clean data is often impractical in real-world applications, especially in fields where data samples are expensive to obtain. To address this limitation, in this work, we introduce \emph{FlowDiff}, a novel joint training paradigm that leverages a conditional normalizing flow model to facilitate the training of diffusion models on corrupted data sources. The conditional normalizing flow try to learn to recover clean images through a novel amortized inference mechanism, and can thus effectively facilitate the diffusion model's training with corrupted data. On the other side, diffusion models provide strong priors which in turn improve the quality of image recovery. The flow model and the diffusion model can therefore promote each other and demonstrate strong empirical performances. Our elaborate experiment shows that FlowDiff can effectively learn clean distributions across a wide range of corrupted data sources, such as noisy and blurry images. It consistently outperforms existing baselines with significant margins under identical conditions. Additionally, we also study the learned diffusion prior, observing its superior performance in downstream computational imaging tasks, including inpainting, denoising, and deblurring.

Integrating Amortized Inference with Diffusion Models for Learning Clean Distribution from Corrupted Images

TL;DR

<3-5 sentence high-level summary>

Abstract

Diffusion models (DMs) have emerged as powerful generative models for solving inverse problems, offering a good approximation of prior distributions of real-world image data. Typically, diffusion models rely on large-scale clean signals to accurately learn the score functions of ground truth clean image distributions. However, such a requirement for large amounts of clean data is often impractical in real-world applications, especially in fields where data samples are expensive to obtain. To address this limitation, in this work, we introduce \emph{FlowDiff}, a novel joint training paradigm that leverages a conditional normalizing flow model to facilitate the training of diffusion models on corrupted data sources. The conditional normalizing flow try to learn to recover clean images through a novel amortized inference mechanism, and can thus effectively facilitate the diffusion model's training with corrupted data. On the other side, diffusion models provide strong priors which in turn improve the quality of image recovery. The flow model and the diffusion model can therefore promote each other and demonstrate strong empirical performances. Our elaborate experiment shows that FlowDiff can effectively learn clean distributions across a wide range of corrupted data sources, such as noisy and blurry images. It consistently outperforms existing baselines with significant margins under identical conditions. Additionally, we also study the learned diffusion prior, observing its superior performance in downstream computational imaging tasks, including inpainting, denoising, and deblurring.
Paper Structure (25 sections, 13 equations, 7 figures, 5 tables)

This paper contains 25 sections, 13 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Overview of the FlowDiff. We aim to train a clean diffusion model, $s_\theta$, using only corrupted observations. To achieve this, a conditional normalizing flow, $G_\varphi$, is introduced to recover underlying clean images through amortized inference. The conditional normalizing flow and the diffusion model are trained jointly: the flow generates clean images for training the diffusion model, while the diffusion model provides an image prior to regularize the output of the flow. Once the two networks reach equilibrium, clean reconstructions of corrupted observations are produced, and a clean diffusion prior is learned.
  • Figure 2: Training procedure of the conditional flow model and the diffusion model. We alternately report the amortized inference results from the flow model and the generative images from the diffusion model during training. The diffusion model initially captures low-frequency signals, guiding the amortized inference model. As the amortized inference improves, it produces better-quality images, further enhancing the diffusion model's training. Eventually, both models converge to produce clean images.
  • Figure 3: Image samples from diffusion models learned from corrupted observations. The three rows show results from models trained on different datasets: noisy MNIST handwritten digits, blurred CIFAR-10 dog images, and noisy fluorescent microscope images. The learned diffusion models generate samples similar to the ground-truth images, significantly outperforming the baseline, AmbientFlow. Notably, when directly training the diffusion model using blurred images (2nd row (b)), we achieve samples with low FID scores. This is because FID mainly measures the similarity of smoothed features among image sets. However, our method (2nd row (d)) produces more reasonable and sharper dog images, despite the FID score not being superior.
  • Figure 4: Amortized inference results on CIFAR-10 deblurring and microscopy imaging tasks. Our method achieves superior performance compared to AmbientFlow, because of the diffusion model's stronger generative modeling capabilities over the flow model employed by AmbientFlow.
  • Figure 5: Posterior samples from the generative model trained on blurred CIFAR-10 images. On four downstream tasks - denoising, deblurring, inpainting, and combined denoising and deblurring - our method surpasses the performance of baseline approaches including AmbientDiffusion, SURE-Score, and AmbientFlow.
  • ...and 2 more figures