Table of Contents
Fetching ...

Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)

Tomer Garber, Tom Tirer

TL;DR

This work addresses the challenge of zero-shot image restoration with diffusion-based priors by proposing CM4IR, a four-step restoration framework built on pretrained Consistency Models. CM4IR leverages initialization from the observation model, back-projection guidance, and a novel noise-injection mechanism that decouples denoising and injection noise and uses a Polyak-acceleration–style update, enabling high-quality results on super-resolution, deblurring, and inpainting with only 4 NFEs. The authors demonstrate that their approach outperforms other zero-shot methods that require many NFEs and, in several cases, rivals task-specific fine-tuned methods, while also improving the performance of existing guided DM methods when NFEs are drastically reduced. The contribution has practical impact by reducing computational cost and enabling robust restoration under varied degradation models, with code to be released for reproducibility.

Abstract

In recent years, it has become popular to tackle image restoration tasks with a single pretrained diffusion model (DM) and data-fidelity guidance, instead of training a dedicated deep neural network per task. However, such "zero-shot" restoration schemes currently require many Neural Function Evaluations (NFEs) for performing well, which may be attributed to the many NFEs needed in the original generative functionality of the DMs. Recently, faster variants of DMs have been explored for image generation. These include Consistency Models (CMs), which can generate samples via a couple of NFEs. However, existing works that use guided CMs for restoration still require tens of NFEs or fine-tuning of the model per task that leads to performance drop if the assumptions during the fine-tuning are not accurate. In this paper, we propose a zero-shot restoration scheme that uses CMs and operates well with as little as 4 NFEs. It is based on a wise combination of several ingredients: better initialization, back-projection guidance, and above all a novel noise injection mechanism. We demonstrate the advantages of our approach for image super-resolution, deblurring and inpainting. Interestingly, we show that the usefulness of our noise injection technique goes beyond CMs: it can also mitigate the performance degradation of existing guided DM methods when reducing their NFE count.

Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)

TL;DR

This work addresses the challenge of zero-shot image restoration with diffusion-based priors by proposing CM4IR, a four-step restoration framework built on pretrained Consistency Models. CM4IR leverages initialization from the observation model, back-projection guidance, and a novel noise-injection mechanism that decouples denoising and injection noise and uses a Polyak-acceleration–style update, enabling high-quality results on super-resolution, deblurring, and inpainting with only 4 NFEs. The authors demonstrate that their approach outperforms other zero-shot methods that require many NFEs and, in several cases, rivals task-specific fine-tuned methods, while also improving the performance of existing guided DM methods when NFEs are drastically reduced. The contribution has practical impact by reducing computational cost and enabling robust restoration under varied degradation models, with code to be released for reproducibility.

Abstract

In recent years, it has become popular to tackle image restoration tasks with a single pretrained diffusion model (DM) and data-fidelity guidance, instead of training a dedicated deep neural network per task. However, such "zero-shot" restoration schemes currently require many Neural Function Evaluations (NFEs) for performing well, which may be attributed to the many NFEs needed in the original generative functionality of the DMs. Recently, faster variants of DMs have been explored for image generation. These include Consistency Models (CMs), which can generate samples via a couple of NFEs. However, existing works that use guided CMs for restoration still require tens of NFEs or fine-tuning of the model per task that leads to performance drop if the assumptions during the fine-tuning are not accurate. In this paper, we propose a zero-shot restoration scheme that uses CMs and operates well with as little as 4 NFEs. It is based on a wise combination of several ingredients: better initialization, back-projection guidance, and above all a novel noise injection mechanism. We demonstrate the advantages of our approach for image super-resolution, deblurring and inpainting. Interestingly, we show that the usefulness of our noise injection technique goes beyond CMs: it can also mitigate the performance degradation of existing guided DM methods when reducing their NFE count.
Paper Structure (21 sections, 2 theorems, 24 equations, 17 figures, 8 tables, 1 algorithm)

This paper contains 21 sections, 2 theorems, 24 equations, 17 figures, 8 tables, 1 algorithm.

Key Result

Proposition 3.1

Under the assumption of song2020denoising on $q_{\eta}(\mathbf{x}_{\tau_1:\tau_N}|\mathbf{x}_0)$, we have $q_{\eta}(\mathbf{x}_{\tau_{n-1}}|\mathbf{x}_{0}) = \mathcal{N}(\mathbf{x}_0,\tau_{n-1}^2\mathbf{I})$ also if we replace $(\mathbf{x}_{\tau_n} - \mathbf{x}_0)/\tau_n$ with $(\mathbf{x}_0-\mathbf

Figures (17)

  • Figure 1: Super-resolution $\times 4$ with bicubic kernel and noise level of 0.05. From left to right and top to bottom: original, observation, DPS chung2022diffusion (1000 NFEs), DiffPIR zhu2023denoising (20 NFEs), DDRM kawar2022denoising (20 NFEs) and our CM4IR (4 NFEs).
  • Figure 2: Deblurring with Gaussian kernel and noise level of 0.025. From left to right and top to bottom: original, observation, DPS chung2022diffusion (1000 NFEs), DiffPIR zhu2023denoising (20 NFEs), DDRM kawar2022denoising (20 NFEs) and our CM4IR (4 NFEs).
  • Figure 3: SRx4 with noise level 0.05. From left to right: original, upsampled observation, DiffPIR (20 NFEs), DDRM (20 NFEs), CM (40 NFEs), CoSIGN (task specific) and our CM4IR (4 NFEs).
  • Figure 6: Super-resolution with noise level 0.025. From left to right: original, observation, DDRM(20 NFEs), DDRM(4 NFEs, auto-calculated), DDRM(4 NFEs, optimized) and DDRM(4 NFEs with our $\hat{\mathbf{z}}^-$ instead of $\hat{\mathbf{z}}$).
  • Figure 7: $\overline{\alpha}_n$ sequences for different $i_N$ and $\gamma$ settings. $\overline{\alpha}_n$ values are clipped to $[0, 0.999]$. Recall that the noise level is $\tau_n=\sqrt{1-\overline{\alpha}_n}$.
  • ...and 12 more figures

Theorems & Definitions (3)

  • Proposition 3.1
  • Proposition
  • proof