Table of Contents
Fetching ...

BIR-Adapter: A parameter-efficient diffusion adapter for blind image restoration

Cem Eteke, Alexander Griessel, Wolfgang Kellerer, Eckehard Steinbach

TL;DR

This work tackles blind image restoration with unknown degradations by leveraging large pretrained diffusion priors. It introduces BIR-Adapter, a tiny, plug-and-play self-referential restoring attention module that reuses internal diffusion features and keeps the backbone frozen, along with a guided sampling strategy to curb hallucinations. Empirically, the method achieves competitive or superior restoration across synthetic and real degradations while requiring up to $36\times$ fewer trainable parameters, and it demonstrates easy plug-and-play integration into existing diffusion pipelines. The results suggest that diffusion priors and degraded-feature reuse can deliver high-quality restoration efficiently, enabling broader application to unknown degradations without full backbone fine-tuning.

Abstract

We introduce the BIR-Adapter, a parameter-efficient diffusion adapter for blind image restoration. Diffusion-based restoration methods have demonstrated promising performance in addressing this fundamental problem in computer vision, typically relying on auxiliary feature extractors or extensive fine-tuning of pre-trained models. Motivated by the observation that large-scale pretrained diffusion models can retain informative representations under common image degradations, BIR-Adapter introduces a parameter-efficient, plug-and-play attention mechanism that substantially reduces the number of trained parameters. To further improve reliability, we propose a sampling guidance mechanism that mitigates hallucinations during the restoration process. Experiments on synthetic and real-world degradations demonstrate that BIR-Adapter achieves competitive, and in several settings superior, performance compared to state-of-the-art methods while requiring up to 36x fewer trained parameters. Moreover, the adapter-based design enables seamless integration into existing models. We validate this generality by extending a super-resolution-only diffusion model to handle additional unknown degradations, highlighting the adaptability of our approach for broader image restoration tasks.

BIR-Adapter: A parameter-efficient diffusion adapter for blind image restoration

TL;DR

This work tackles blind image restoration with unknown degradations by leveraging large pretrained diffusion priors. It introduces BIR-Adapter, a tiny, plug-and-play self-referential restoring attention module that reuses internal diffusion features and keeps the backbone frozen, along with a guided sampling strategy to curb hallucinations. Empirically, the method achieves competitive or superior restoration across synthetic and real degradations while requiring up to fewer trainable parameters, and it demonstrates easy plug-and-play integration into existing diffusion pipelines. The results suggest that diffusion priors and degraded-feature reuse can deliver high-quality restoration efficiently, enabling broader application to unknown degradations without full backbone fine-tuning.

Abstract

We introduce the BIR-Adapter, a parameter-efficient diffusion adapter for blind image restoration. Diffusion-based restoration methods have demonstrated promising performance in addressing this fundamental problem in computer vision, typically relying on auxiliary feature extractors or extensive fine-tuning of pre-trained models. Motivated by the observation that large-scale pretrained diffusion models can retain informative representations under common image degradations, BIR-Adapter introduces a parameter-efficient, plug-and-play attention mechanism that substantially reduces the number of trained parameters. To further improve reliability, we propose a sampling guidance mechanism that mitigates hallucinations during the restoration process. Experiments on synthetic and real-world degradations demonstrate that BIR-Adapter achieves competitive, and in several settings superior, performance compared to state-of-the-art methods while requiring up to 36x fewer trained parameters. Moreover, the adapter-based design enables seamless integration into existing models. We validate this generality by extending a super-resolution-only diffusion model to handle additional unknown degradations, highlighting the adaptability of our approach for broader image restoration tasks.

Paper Structure

This paper contains 29 sections, 12 equations, 15 figures, 2 tables, 1 algorithm.

Figures (15)

  • Figure 1: Example restoration results of BIR-Adapter on images degraded with $4\times$ downsampling. The lower-left image is further degraded with blur, white noise, and JPEG compression, while the right-hand image includes additional white noise.
  • Figure 2: Cosine similarity between latent representations of clean and degraded images across different layers of a U-Net-based latent diffusion model. Similarities are measured at the outputs of the Downsampling (Down), Middle (Mid), and Upsampling (Up) blocks of the U-Net. The degradations are combinations of $4\times$ downsampling ($\downarrow_4$), additive white noise ($\sigma_n$), Gaussian blur ($\sigma_b$), and JPEG compression ($Q$).
  • Figure 3: BIR-Adapter in a denoising diffusion model $\epsilon_\theta$. BIR-Adapter (\ref{['fig:bir']}) introduces a self-referential restoring attention mechanism within an attention block (\ref{['fig:block']}) of the denoising model (\ref{['fig:denoiser']}). The model incorporates intermediate features $\tilde{\mathbf{z}}^k$ of the degraded latents $\tilde{\mathbf{x}}$, which are extracted by $\epsilon_\theta$ itself. This design leverages the observation that diffusion models retain informative representations under degradation and enables a parameter-efficient adaptation without auxiliary feature extractors. For clarity, the U-Net–based LDM in (\ref{['fig:denoiser']}) is simplified by visualizing fewer blocks and omitting residual connections, and the number of attention layers is reduced. In (\ref{['fig:block']}), self-attention is applied in a cascaded manner with parallel processing of degraded and diffused features.
  • Figure 4: Quantitative and visual analysis of the effect of $\xi$. An increase in CLIP-IQA indicates higher-quality images, while a sudden drop in PSNR suggests inconsistencies.
  • Figure 5: Example degraded and restored images using the baselines and our method. We used synthetic degradation on the DIV2K dataset (\ref{['fig:div2k']}) while RealSR contains $4 \times$ downsampled images with further unknown degradations (\ref{['fig:realsr']}).
  • ...and 10 more figures