Table of Contents
Fetching ...

InstantIR: Blind Image Restoration with Instant Generative Reference

Jen-Yuan Huang, Haofan Wang, Qixun Wang, Xu Bai, Hao Ai, Peng Xing, Jen-Tse Huang

TL;DR

InstantIR tackles blind image restoration under unknown degradations by dynamically aligning to a generative diffusion prior during inference. It introduces a three-part pipeline—the Degradation Content Perceptor (DCP) for compact LQ encoding, the Instant Restoration Previewer for on-the-fly generative references, and the Latent Aggregator to fuse references with the LQ latent during sampling—along with an adaptive restoration (AdaRes) mechanism that uses a quality indicator derived from the Previewer trajectory. The approach achieves state-of-the-art perceptual quality and faithful detail restoration on synthetic and real datasets, and supports text-guided semantic editing for creative restoration while maintaining global structure. This method improves robustness to unknown degradations and provides an active interface for natural language guidance, offering practical benefits for photo editing, restoration in challenging conditions, and downstream vision tasks. The diffusion-based framework is formalized with the standard loss $L_{diff} = \mathbb{E}[\| \boldsymbol{ε}_{θ}(\boldsymbol{x}_t,t) - \boldsymbol{ε} \|^2]$, and its novel components enable adaptive conditioning and controllable restoration outcomes.

Abstract

Handling test-time unknown degradation is the major challenge in Blind Image Restoration (BIR), necessitating high model generalization. An effective strategy is to incorporate prior knowledge, either from human input or generative model. In this paper, we introduce Instant-reference Image Restoration (InstantIR), a novel diffusion-based BIR method which dynamically adjusts generation condition during inference. We first extract a compact representation of the input via a pre-trained vision encoder. At each generation step, this representation is used to decode current diffusion latent and instantiate it in the generative prior. The degraded image is then encoded with this reference, providing robust generation condition. We observe the variance of generative references fluctuate with degradation intensity, which we further leverage as an indicator for developing a sampling algorithm adaptive to input quality. Extensive experiments demonstrate InstantIR achieves state-of-the-art performance and offering outstanding visual quality. Through modulating generative references with textual description, InstantIR can restore extreme degradation and additionally feature creative restoration.

InstantIR: Blind Image Restoration with Instant Generative Reference

TL;DR

InstantIR tackles blind image restoration under unknown degradations by dynamically aligning to a generative diffusion prior during inference. It introduces a three-part pipeline—the Degradation Content Perceptor (DCP) for compact LQ encoding, the Instant Restoration Previewer for on-the-fly generative references, and the Latent Aggregator to fuse references with the LQ latent during sampling—along with an adaptive restoration (AdaRes) mechanism that uses a quality indicator derived from the Previewer trajectory. The approach achieves state-of-the-art perceptual quality and faithful detail restoration on synthetic and real datasets, and supports text-guided semantic editing for creative restoration while maintaining global structure. This method improves robustness to unknown degradations and provides an active interface for natural language guidance, offering practical benefits for photo editing, restoration in challenging conditions, and downstream vision tasks. The diffusion-based framework is formalized with the standard loss , and its novel components enable adaptive conditioning and controllable restoration outcomes.

Abstract

Handling test-time unknown degradation is the major challenge in Blind Image Restoration (BIR), necessitating high model generalization. An effective strategy is to incorporate prior knowledge, either from human input or generative model. In this paper, we introduce Instant-reference Image Restoration (InstantIR), a novel diffusion-based BIR method which dynamically adjusts generation condition during inference. We first extract a compact representation of the input via a pre-trained vision encoder. At each generation step, this representation is used to decode current diffusion latent and instantiate it in the generative prior. The degraded image is then encoded with this reference, providing robust generation condition. We observe the variance of generative references fluctuate with degradation intensity, which we further leverage as an indicator for developing a sampling algorithm adaptive to input quality. Extensive experiments demonstrate InstantIR achieves state-of-the-art performance and offering outstanding visual quality. Through modulating generative references with textual description, InstantIR can restore extreme degradation and additionally feature creative restoration.

Paper Structure

This paper contains 27 sections, 7 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: I. InstantIR presents exceptional capability in reproducing photorealistic details. II. InstantIR provides an active interface for natural language guidance, helps handling large degradation and features creative restoration with semantic editing.
  • Figure 2: The overall pipeline of InstantIR. InstantIR adopts a novel previewing mechanism to actively align with generative prior. This is achieved by three key modules: 1) DCP for compact LQ image representation encoding; 2) Previewer for decoding it into generative prior; and 3) Aggregator for integrating the generative reference and LQ input into sampling conditions.
  • Figure 3: The evolution of the Previewer outputs during generation. (a) L2-distances between previews and denoising means; (b) temporal differences of the Previewer trajectory, measured by L2-distances between adjacent points; (c) relative distances between previews and denoising means.
  • Figure 4: Qualitative comparisons on real-world LQ images. Restorations from InstantIR are rich in details with global semantic consistency. Better viewed zoom in.
  • Figure 5: Visual examples of the previewing mechanism in InstantIR. Better viewed zoom in.
  • ...and 2 more figures