Table of Contents
Fetching ...

Efficient Conditional Diffusion Model with Probability Flow Sampling for Image Super-resolution

Yutao Yuan, Chun Yuan

TL;DR

This work targets the ill-posed nature of image super-resolution by learning the conditional distribution of high-resolution images given low-resolution inputs through a continuous-time conditional diffusion model. It introduces Efficient Conditional Diffusion Model with Probability Flow Sampling (ECDP), which uses probability-flow ODE-based sampling to reduce generation time, and a hybrid parametrization for the denoiser to improve consistency across noise scales. An image-quality loss in feature space further aligns generated HR images with ground truth, enhancing perceptual fidelity. Across DIV2K, ImageNet, and CelebA, ECDP delivers superior perceptual SR quality while achieving faster sampling than existing diffusion-based methods, demonstrating practical gains for high-quality, efficient SR. The authors provide open-source code, facilitating reproducibility and broader adoption.

Abstract

Image super-resolution is a fundamentally ill-posed problem because multiple valid high-resolution images exist for one low-resolution image. Super-resolution methods based on diffusion probabilistic models can deal with the ill-posed nature by learning the distribution of high-resolution images conditioned on low-resolution images, avoiding the problem of blurry images in PSNR-oriented methods. However, existing diffusion-based super-resolution methods have high time consumption with the use of iterative sampling, while the quality and consistency of generated images are less than ideal due to problems like color shifting. In this paper, we propose Efficient Conditional Diffusion Model with Probability Flow Sampling (ECDP) for image super-resolution. To reduce the time consumption, we design a continuous-time conditional diffusion model for image super-resolution, which enables the use of probability flow sampling for efficient generation. Additionally, to improve the consistency of generated images, we propose a hybrid parametrization for the denoiser network, which interpolates between the data-predicting parametrization and the noise-predicting parametrization for different noise scales. Moreover, we design an image quality loss as a complement to the score matching loss of diffusion models, further improving the consistency and quality of super-resolution. Extensive experiments on DIV2K, ImageNet, and CelebA demonstrate that our method achieves higher super-resolution quality than existing diffusion-based image super-resolution methods while having lower time consumption. Our code is available at https://github.com/Yuan-Yutao/ECDP.

Efficient Conditional Diffusion Model with Probability Flow Sampling for Image Super-resolution

TL;DR

This work targets the ill-posed nature of image super-resolution by learning the conditional distribution of high-resolution images given low-resolution inputs through a continuous-time conditional diffusion model. It introduces Efficient Conditional Diffusion Model with Probability Flow Sampling (ECDP), which uses probability-flow ODE-based sampling to reduce generation time, and a hybrid parametrization for the denoiser to improve consistency across noise scales. An image-quality loss in feature space further aligns generated HR images with ground truth, enhancing perceptual fidelity. Across DIV2K, ImageNet, and CelebA, ECDP delivers superior perceptual SR quality while achieving faster sampling than existing diffusion-based methods, demonstrating practical gains for high-quality, efficient SR. The authors provide open-source code, facilitating reproducibility and broader adoption.

Abstract

Image super-resolution is a fundamentally ill-posed problem because multiple valid high-resolution images exist for one low-resolution image. Super-resolution methods based on diffusion probabilistic models can deal with the ill-posed nature by learning the distribution of high-resolution images conditioned on low-resolution images, avoiding the problem of blurry images in PSNR-oriented methods. However, existing diffusion-based super-resolution methods have high time consumption with the use of iterative sampling, while the quality and consistency of generated images are less than ideal due to problems like color shifting. In this paper, we propose Efficient Conditional Diffusion Model with Probability Flow Sampling (ECDP) for image super-resolution. To reduce the time consumption, we design a continuous-time conditional diffusion model for image super-resolution, which enables the use of probability flow sampling for efficient generation. Additionally, to improve the consistency of generated images, we propose a hybrid parametrization for the denoiser network, which interpolates between the data-predicting parametrization and the noise-predicting parametrization for different noise scales. Moreover, we design an image quality loss as a complement to the score matching loss of diffusion models, further improving the consistency and quality of super-resolution. Extensive experiments on DIV2K, ImageNet, and CelebA demonstrate that our method achieves higher super-resolution quality than existing diffusion-based image super-resolution methods while having lower time consumption. Our code is available at https://github.com/Yuan-Yutao/ECDP.
Paper Structure (31 sections, 2 theorems, 28 equations, 9 figures, 7 tables)

This paper contains 31 sections, 2 theorems, 28 equations, 9 figures, 7 tables.

Key Result

Proposition 1

The forward process given by eq:ours-forward keeps the mean and variance of $\bm{x}(t)$ conditioned on $\bm{y}$ unchanged during the transform from $t = 0$ to $t = T$. More specifically:

Figures (9)

  • Figure 1: Overview of ECDP. Top left: Continuous-time conditional diffusion uses a forward SDE to transform images into noise, and generate new images from noise using probability flow. Bottom: The conditional score in the probability flow is approximated with a hybrid-parametrization score predictor $\bm{s}_\theta$, which is trained using score matching. Top right: An additional image quality loss that compares the generated HR images with the ground truth is computed using probability flow sampling, improving the quality of super-resolution results.
  • Figure 2: $\epsilon$-param
  • Figure 3: $x_0$-param
  • Figure 4: hybrid-param
  • Figure 6: Visualization of images generated by our method trained without and with $\mathcal{L}_{\text{quality}}$. The image generated by the model with $\mathcal{L}_{\text{quality}}$ has more visible structure (the lines on the pillar) and less background noise.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Proposition 1
  • Lemma
  • proof