Table of Contents
Fetching ...

Arbitrary-steps Image Super-resolution via Diffusion Inversion

Zongsheng Yue, Kang Liao, Chen Change Loy

TL;DR

This paper proposes InvSR, a diffusion-inversion-based super-resolution framework that leverages a fixed pre-trained diffusion backbone together with a trainable noise predictor to invert a low-resolution image and generate a high-resolution output. A Partial Noise Prediction (PnP) strategy reduces inversion complexity by starting sampling at an intermediate timestep and compressing the noise maps to a small set, enabling arbitrary-step sampling from 1 to 5. Training optimizes a combination of $\\mathcal{L}_2$, LPIPS, and GAN losses to align recovered outputs with ground-truth HR images while maintaining perceptual quality, and experiments show InvSR achieves state-of-the-art or competitive performance with substantial efficiency gains, even in single-step setups. The method demonstrates strong performance across synthetic and real-world SR benchmarks and offers practical flexibility for adapting the sampling process to different degradation types, with potential for further speed-ups via model quantization and hardware optimization.

Abstract

This study presents a new image super-resolution (SR) technique based on diffusion inversion, aiming at harnessing the rich image priors encapsulated in large pre-trained diffusion models to improve SR performance. We design a Partial noise Prediction strategy to construct an intermediate state of the diffusion model, which serves as the starting sampling point. Central to our approach is a deep noise predictor to estimate the optimal noise maps for the forward diffusion process. Once trained, this noise predictor can be used to initialize the sampling process partially along the diffusion trajectory, generating the desirable high-resolution result. Compared to existing approaches, our method offers a flexible and efficient sampling mechanism that supports an arbitrary number of sampling steps, ranging from one to five. Even with a single sampling step, our method demonstrates superior or comparable performance to recent state-of-the-art approaches. The code and model are publicly available at https://github.com/zsyOAOA/InvSR.

Arbitrary-steps Image Super-resolution via Diffusion Inversion

TL;DR

This paper proposes InvSR, a diffusion-inversion-based super-resolution framework that leverages a fixed pre-trained diffusion backbone together with a trainable noise predictor to invert a low-resolution image and generate a high-resolution output. A Partial Noise Prediction (PnP) strategy reduces inversion complexity by starting sampling at an intermediate timestep and compressing the noise maps to a small set, enabling arbitrary-step sampling from 1 to 5. Training optimizes a combination of , LPIPS, and GAN losses to align recovered outputs with ground-truth HR images while maintaining perceptual quality, and experiments show InvSR achieves state-of-the-art or competitive performance with substantial efficiency gains, even in single-step setups. The method demonstrates strong performance across synthetic and real-world SR benchmarks and offers practical flexibility for adapting the sampling process to different degradation types, with potential for further speed-ups via model quantization and hardware optimization.

Abstract

This study presents a new image super-resolution (SR) technique based on diffusion inversion, aiming at harnessing the rich image priors encapsulated in large pre-trained diffusion models to improve SR performance. We design a Partial noise Prediction strategy to construct an intermediate state of the diffusion model, which serves as the starting sampling point. Central to our approach is a deep noise predictor to estimate the optimal noise maps for the forward diffusion process. Once trained, this noise predictor can be used to initialize the sampling process partially along the diffusion trajectory, generating the desirable high-resolution result. Compared to existing approaches, our method offers a flexible and efficient sampling mechanism that supports an arbitrary number of sampling steps, ranging from one to five. Even with a single sampling step, our method demonstrates superior or comparable performance to recent state-of-the-art approaches. The code and model are publicly available at https://github.com/zsyOAOA/InvSR.

Paper Structure

This paper contains 19 sections, 11 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: Qualitative comparisons of our proposed method to recent state-of-the-art diffusion-based approaches on two real-world examples, where the number of sampling steps is annotated in the format "Method name-Steps". We provide the runtime (in milliseconds) highlighted by red in the sub-caption of the first example , which is tested on $\times$4 ($128\rightarrow 512$) SR task on an A100 GPU. Our method offers an efficient and flexible sampling mechanism, allowing users to freely adjust the number of sampling steps based on the degradation type or their specific requirements. In the first example, mainly degraded by blurriness, multi-step sampling is preferable to single-step sampling as it progressively recovers finer details. Conversely, in the second example with severe noise, a single sampling step is sufficient to achieve satisfactory results, whereas additional steps may amplify the noise and introduce unwanted artifacts. (Zoom-in for best view)
  • Figure 2: Inference flow of our proposed method, wherein $\{\tau_i\}_{i=1}^S$ denotes the inversion timesteps. Note that the predicted noise map $\bm{z}_{\tau_S}$ exhibits an obvious correlation with the LR image, indicating the non-zero mean property of its statistical distribution.
  • Figure 3: From left to right: (a) zoomed LR image, (b) predicted noise map by our method for the initial timestep, (c) super-resolved results by our method with a single sampling step.
  • Figure 4: Visual results of different methods on two typical real-world examples from RealSet80 dataset. For clear comparisons, the number of sampling steps is annotated in the format "Method name-Steps" for diffusion-based approaches. (Zoom-in for best view)
  • Figure 5: A typical visual comparison of the proposed InvSR based on different diffusion models: SD-2.0 and SD-Turbo. Note that these results are achieved with five sampling steps.
  • ...and 4 more figures