Table of Contents
Fetching ...

Harnessing Diffusion-Yielded Score Priors for Image Restoration

Xinqi Lin, Fanghua Yu, Jinfan Hu, Zhiyuan You, Wu Shi, Jimmy S. Ren, Jinjin Gu, Chao Dong

TL;DR

HYPIR introduces a practical image restoration paradigm that leverages pretrained diffusion models as initialization priors and refines them with lightweight adversarial training using LoRA. By bypassing diffusion losses and iterative sampling, the method achieves fast, stable convergence while delivering high-fidelity, realistic restorations and enabling user-controlled prompts, texture richness, and fidelity-generation trade-offs. Theoretical results quantify proximity to the natural image distribution and illustrate benefits such as small initial gradients, broad mode coverage, and accelerated convergence. Empirically, HYPIR outperforms prior state-of-the-art approaches across multiple datasets, scales to large diffusion backbones, and offers flexible, controllable restoration for real-world scenarios.

Abstract

Deep image restoration models aim to learn a mapping from degraded image space to natural image space. However, they face several critical challenges: removing degradation, generating realistic details, and ensuring pixel-level consistency. Over time, three major classes of methods have emerged, including MSE-based, GAN-based, and diffusion-based methods. However, they fail to achieve a good balance between restoration quality, fidelity, and speed. We propose a novel method, HYPIR, to address these challenges. Our solution pipeline is straightforward: it involves initializing the image restoration model with a pre-trained diffusion model and then fine-tuning it with adversarial training. This approach does not rely on diffusion loss, iterative sampling, or additional adapters. We theoretically demonstrate that initializing adversarial training from a pre-trained diffusion model positions the initial restoration model very close to the natural image distribution. Consequently, this initialization improves numerical stability, avoids mode collapse, and substantially accelerates the convergence of adversarial training. Moreover, HYPIR inherits the capabilities of diffusion models with rich user control, enabling text-guided restoration and adjustable texture richness. Requiring only a single forward pass, it achieves faster convergence and inference speed than diffusion-based methods. Extensive experiments show that HYPIR outperforms previous state-of-the-art methods, achieving efficient and high-quality image restoration.

Harnessing Diffusion-Yielded Score Priors for Image Restoration

TL;DR

HYPIR introduces a practical image restoration paradigm that leverages pretrained diffusion models as initialization priors and refines them with lightweight adversarial training using LoRA. By bypassing diffusion losses and iterative sampling, the method achieves fast, stable convergence while delivering high-fidelity, realistic restorations and enabling user-controlled prompts, texture richness, and fidelity-generation trade-offs. Theoretical results quantify proximity to the natural image distribution and illustrate benefits such as small initial gradients, broad mode coverage, and accelerated convergence. Empirically, HYPIR outperforms prior state-of-the-art approaches across multiple datasets, scales to large diffusion backbones, and offers flexible, controllable restoration for real-world scenarios.

Abstract

Deep image restoration models aim to learn a mapping from degraded image space to natural image space. However, they face several critical challenges: removing degradation, generating realistic details, and ensuring pixel-level consistency. Over time, three major classes of methods have emerged, including MSE-based, GAN-based, and diffusion-based methods. However, they fail to achieve a good balance between restoration quality, fidelity, and speed. We propose a novel method, HYPIR, to address these challenges. Our solution pipeline is straightforward: it involves initializing the image restoration model with a pre-trained diffusion model and then fine-tuning it with adversarial training. This approach does not rely on diffusion loss, iterative sampling, or additional adapters. We theoretically demonstrate that initializing adversarial training from a pre-trained diffusion model positions the initial restoration model very close to the natural image distribution. Consequently, this initialization improves numerical stability, avoids mode collapse, and substantially accelerates the convergence of adversarial training. Moreover, HYPIR inherits the capabilities of diffusion models with rich user control, enabling text-guided restoration and adjustable texture richness. Requiring only a single forward pass, it achieves faster convergence and inference speed than diffusion-based methods. Extensive experiments show that HYPIR outperforms previous state-of-the-art methods, achieving efficient and high-quality image restoration.

Paper Structure

This paper contains 57 sections, 4 theorems, 39 equations, 22 figures, 1 table.

Key Result

theorem 1

Assume the diffusion network $\mathcal{U}_{\theta_\mathrm{Diff}}$ whose score error on $(p_{\mathrm{data}} *k_\sigma)$ is bounded by $\varepsilon_{\mathrm{sc}}$, where $\theta_\mathrm{Diff}$ denotes the parameters of the pretrained diffusion model. And let $k_{\mathrm{deg}}$ be the degradation kerne where $C_1,C_2>0$ depend only on the Lipschitz constants of $\mathcal{U}_{\theta_\mathrm{Diff}}$ an

Figures (22)

  • Figure 1: Existing pixel-level loss, adversarial training, and diffusion-based image restoration methods struggle with over-smoothness, unrealistic textures, and slow, unstable generation. Our approach leverages diffusion initialization followed by GAN training, effectively balancing realism and efficiency.
  • Figure 2: Illustration of our proposed image restoration pipeline. (a) We start with a pre-trained diffusion model. (b) The VAE encoder is fine-tuned specifically for degradation pre-removal, enhancing robustness against severe image degradation. (c) Subsequently, the degradation-aware encoder and pre-trained decoder initialize an adversarially-trained image restoration model, where only the "Restore Network" is optimized during this stage.
  • Figure 3: (a) The discriminator logits for real and generated images across training steps. Diffusion-based initialization yields rapid and stable convergence, reflecting better alignment between generated and real distributions compared to MSE and denoising autoencoder (DAE) initializations. (b) Magnitude of gradients backpropagated from the discriminator to the generator. Diffusion initialization produces consistently small gradients, highlighting improved numerical stability and efficiency during GAN post-training.
  • Figure 4: Visual comparison illustrating mode collapse in image restoration GANs without diffusion initialization (middle row, problematic textures highlighted) versus the improved semantic diversity achieved by the proposed diffusion-initialized adversarial training (bottom row). Please refer to the magnified view for a more detailed examination. Photo Credits: Images from the DIV2K dataset (licensed CC BY 4.0).
  • Figure 5: Comparison of restoration progress without (top) and with diffusion initialization (bottom). Diffusion initialization yields clearer, stable outputs early in training. Please refer to the magnified view for a more detailed examination. Photo Credits: Images from the DIV2K dataset (licensed CC BY 4.0).
  • ...and 17 more figures

Theorems & Definitions (4)

  • theorem 1: Diffusion-to-Restoration Proximity
  • lemma 1: Initial gradient bound
  • proposition 1: Uniform mode‐mass bound
  • proposition 2: Linear-logarithmic convergence rate