Table of Contents
Fetching ...

A Simple Combination of Diffusion Models for Better Quality Trade-Offs in Image Denoising

Jonas Dornbusch, Emanuel Pfarr, Florin-Alexandru Vasluianu, Frank Werner, Radu Timofte

TL;DR

The paper tackles Gaussian denoising with diffusion models and introduces LCDD, a simple linear-combination approach that inserts a noisy input at an intermediate diffusion state and fuses outputs from short and long inference schedules using a scalar $\lambda$. By using a single pretrained network to handle multiple noise levels and varying the inference schedule length, LCDD achieves favorable distortion-perception trade-offs without additional training. The method delivers state-of-the-art or competitive results across multiple benchmarks (FFHQ, ImageNet, BSD68, McMaster) in PSNR, FID, and LPIPS, and qualitative results show detailed, natural reconstructions. This work provides a practical, parameter-light strategy for balancing distortion and perceptual quality in diffusion-based image restoration with broad applicability and speed advantages.

Abstract

Diffusion models have garnered considerable interest in computer vision, owing both to their capacity to synthesize photorealistic images and to their proven effectiveness in image reconstruction tasks. However, existing approaches fail to efficiently balance the high visual quality of diffusion models with the low distortion achieved by previous image reconstruction methods. Specifically, for the fundamental task of additive Gaussian noise removal, we first illustrate an intuitive method for leveraging pretrained diffusion models. Further, we introduce our proposed Linear Combination Diffusion Denoiser (LCDD), which unifies two complementary inference procedures - one that leverages the model's generative potential and another that ensures faithful signal recovery. By exploiting the inherent structure of the denoising samples, LCDD achieves state-of-the-art performance and offers controlled, well-behaved trade-offs through a simple scalar hyperparameter adjustment.

A Simple Combination of Diffusion Models for Better Quality Trade-Offs in Image Denoising

TL;DR

The paper tackles Gaussian denoising with diffusion models and introduces LCDD, a simple linear-combination approach that inserts a noisy input at an intermediate diffusion state and fuses outputs from short and long inference schedules using a scalar . By using a single pretrained network to handle multiple noise levels and varying the inference schedule length, LCDD achieves favorable distortion-perception trade-offs without additional training. The method delivers state-of-the-art or competitive results across multiple benchmarks (FFHQ, ImageNet, BSD68, McMaster) in PSNR, FID, and LPIPS, and qualitative results show detailed, natural reconstructions. This work provides a practical, parameter-light strategy for balancing distortion and perceptual quality in diffusion-based image restoration with broad applicability and speed advantages.

Abstract

Diffusion models have garnered considerable interest in computer vision, owing both to their capacity to synthesize photorealistic images and to their proven effectiveness in image reconstruction tasks. However, existing approaches fail to efficiently balance the high visual quality of diffusion models with the low distortion achieved by previous image reconstruction methods. Specifically, for the fundamental task of additive Gaussian noise removal, we first illustrate an intuitive method for leveraging pretrained diffusion models. Further, we introduce our proposed Linear Combination Diffusion Denoiser (LCDD), which unifies two complementary inference procedures - one that leverages the model's generative potential and another that ensures faithful signal recovery. By exploiting the inherent structure of the denoising samples, LCDD achieves state-of-the-art performance and offers controlled, well-behaved trade-offs through a simple scalar hyperparameter adjustment.

Paper Structure

This paper contains 11 sections, 14 equations, 16 figures, 2 tables, 1 algorithm.

Figures (16)

  • Figure 1: Overwiev of our proposed LCDD$\;$ method. A noisy image is scaled, such that it closely resembles an intermediate step in the inference process of a classical DM, and denoised using the structure of DMs. The denoised image is then linearly combined with the initial prediction of the first diffusion step, allowing us to achieve favorable distortion-perception trade-offs.
  • Figure 2: The trajectories of three methods compared to two reference methods. The distortion measure is designed such that higher scores are better, while for the perception measure lower scores are better. The method that describes the green trajectory has consistently advantageous trade-offs, while the methods describing either of the red trajectories do not.
  • Figure 6: An example of the linear combination of the one-step schedule and the 168-step schedule of the DDIM variant while denoising an image taken from the FFHQ dataset with noise level $\rho = 75$. In the upper left we show the one-step schedule, while in the bottom right we show the 168-step schedule. All intermediate pictures from left to right and top to bottom are linear combinations. The noisy image and ground truth can be found in \ref{['fig:ExampleDenoisedImages']}.
  • Figure 7: Denoised images using diffusion models with different inference lengths and our proposed linear combination method. The first three rows presents the DDIM variant on the FFHQ, BSD68 and ImageNet dataset, respectively. The fourth row displays the DDPM variant with an ImageNet sample. The first column contains the noisy input images, all with a noise level of $\rho = 75$, corresponding to $\hat{k} = 168$. The column labeled $\text{LC}_\text{D}$ represents a linear combination of the 168-step schedule with emphasis on distortion, where we set the combination factor to $\lambda=0.6$ for DDIM and $\lambda=0.75$ for DDPM. The column $\text{LC}_\text{P}$ highlights a linear combination of the 168-step focused on perception, using $\lambda=0.2$ for DDIM and $\lambda=0.5$ for DDPM. The exact performance metrics for these denoising schemes can be found in \ref{['tab:DDPMandDDIMDifferentSchedules']}.
  • Figure : DDIM on FFHQ
  • ...and 11 more figures