Table of Contents
Fetching ...

Diffusion Models, Image Super-Resolution And Everything: A Survey

Brian B. Moser, Arundhati S. Shanbhag, Federico Raue, Stanislav Frolov, Sebastian Palacio, Andreas Dengel

TL;DR

A unified recount of the theoretical foundations underlying DMs applied to image SR is provided and a detailed analysis is offered that underscores the unique characteristics and methodologies within this domain, distinct from broader existing reviews in the field.

Abstract

Diffusion Models (DMs) have disrupted the image Super-Resolution (SR) field and further closed the gap between image quality and human perceptual preferences. They are easy to train and can produce very high-quality samples that exceed the realism of those produced by previous generative methods. Despite their promising results, they also come with new challenges that need further research: high computational demands, comparability, lack of explainability, color shifts, and more. Unfortunately, entry into this field is overwhelming because of the abundance of publications. To address this, we provide a unified recount of the theoretical foundations underlying DMs applied to image SR and offer a detailed analysis that underscores the unique characteristics and methodologies within this domain, distinct from broader existing reviews in the field. This survey articulates a cohesive understanding of DM principles and explores current research avenues, including alternative input domains, conditioning techniques, guidance mechanisms, corruption spaces, and zero-shot learning approaches. By offering a detailed examination of the evolution and current trends in image SR through the lens of DMs, this survey sheds light on the existing challenges and charts potential future directions, aiming to inspire further innovation in this rapidly advancing area.

Diffusion Models, Image Super-Resolution And Everything: A Survey

TL;DR

A unified recount of the theoretical foundations underlying DMs applied to image SR is provided and a detailed analysis is offered that underscores the unique characteristics and methodologies within this domain, distinct from broader existing reviews in the field.

Abstract

Diffusion Models (DMs) have disrupted the image Super-Resolution (SR) field and further closed the gap between image quality and human perceptual preferences. They are easy to train and can produce very high-quality samples that exceed the realism of those produced by previous generative methods. Despite their promising results, they also come with new challenges that need further research: high computational demands, comparability, lack of explainability, color shifts, and more. Unfortunately, entry into this field is overwhelming because of the abundance of publications. To address this, we provide a unified recount of the theoretical foundations underlying DMs applied to image SR and offer a detailed analysis that underscores the unique characteristics and methodologies within this domain, distinct from broader existing reviews in the field. This survey articulates a cohesive understanding of DM principles and explores current research avenues, including alternative input domains, conditioning techniques, guidance mechanisms, corruption spaces, and zero-shot learning approaches. By offering a detailed examination of the evolution and current trends in image SR through the lens of DMs, this survey sheds light on the existing challenges and charts potential future directions, aiming to inspire further innovation in this rapidly advancing area.
Paper Structure (40 sections, 47 equations, 8 figures, 4 tables)

This paper contains 40 sections, 47 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Principle of DMs. The forward diffusion adds noise iteratively (red), which translates an image from the image space to the corruption space. The backward diffusion, the iterative refinement process, reverts the process (blue) back to the image space. Shown are three different implementations of DMs, namely Denoising Diffusion Probabilistic Models (DDPMs), Score-based Generative Models (SGMs), and Stochastic Differential Equations (SDEs) with their respect formulation of the forward and backward diffusion.
  • Figure 2: Conceptual overview of generative models (GANs, VAEs, NFs, and DMs).
  • Figure 3: Topology of this work. Conditioning (\ref{['sec:conditioning']}) leads the backward diffusion, whereas guidance (\ref{['sec:guidance']}) is a training strategy to improve the incorporation of conditioning into DMs. The state domain (\ref{['sec:altDom']}) describes the representation of states $\mathbf{z}_t$. The corruption space (\ref{['sec:corrupt']}) describes the target of the forward diffusion process or the start of the backward diffusion.
  • Figure 4: Overview of state domains. The green bar shows the vanilla DM operating in pixel space. The blue bar shows the exploit of the latent space domain via Autoencoders. The red bar shows the application of DMs in the wavelet domain.
  • Figure 5: Overview of DiffuseVAE. The two-stage approach employs a VAE (first stage), which generates variational prediction as a condition for the DM (second stage).
  • ...and 3 more figures