InstantIR: Blind Image Restoration with Instant Generative Reference
Jen-Yuan Huang, Haofan Wang, Qixun Wang, Xu Bai, Hao Ai, Peng Xing, Jen-Tse Huang
TL;DR
InstantIR tackles blind image restoration under unknown degradations by dynamically aligning to a generative diffusion prior during inference. It introduces a three-part pipeline—the Degradation Content Perceptor (DCP) for compact LQ encoding, the Instant Restoration Previewer for on-the-fly generative references, and the Latent Aggregator to fuse references with the LQ latent during sampling—along with an adaptive restoration (AdaRes) mechanism that uses a quality indicator derived from the Previewer trajectory. The approach achieves state-of-the-art perceptual quality and faithful detail restoration on synthetic and real datasets, and supports text-guided semantic editing for creative restoration while maintaining global structure. This method improves robustness to unknown degradations and provides an active interface for natural language guidance, offering practical benefits for photo editing, restoration in challenging conditions, and downstream vision tasks. The diffusion-based framework is formalized with the standard loss $L_{diff} = \mathbb{E}[\| \boldsymbol{ε}_{θ}(\boldsymbol{x}_t,t) - \boldsymbol{ε} \|^2]$, and its novel components enable adaptive conditioning and controllable restoration outcomes.
Abstract
Handling test-time unknown degradation is the major challenge in Blind Image Restoration (BIR), necessitating high model generalization. An effective strategy is to incorporate prior knowledge, either from human input or generative model. In this paper, we introduce Instant-reference Image Restoration (InstantIR), a novel diffusion-based BIR method which dynamically adjusts generation condition during inference. We first extract a compact representation of the input via a pre-trained vision encoder. At each generation step, this representation is used to decode current diffusion latent and instantiate it in the generative prior. The degraded image is then encoded with this reference, providing robust generation condition. We observe the variance of generative references fluctuate with degradation intensity, which we further leverage as an indicator for developing a sampling algorithm adaptive to input quality. Extensive experiments demonstrate InstantIR achieves state-of-the-art performance and offering outstanding visual quality. Through modulating generative references with textual description, InstantIR can restore extreme degradation and additionally feature creative restoration.
