Table of Contents
Fetching ...

Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation

Bin Ren, Eduard Zamfir, Zongwei Wu, Yawei Li, Yidi Li, Danda Pani Paudel, Radu Timofte, Ming-Hsuan Yang, Luc Van Gool, Nicu Sebe

TL;DR

AnyIR tackles the problem of restoring images degraded by diverse factors with a single, efficient model suitable for edge devices. It introduces a four-level U-Net equipped with Degradation Adaptation Blocks that split features into an attention pathway and a gated pathway, with a temperature-guided gating mechanism and a sub-latent channel split to reduce computation; the design includes a spatial-frequency fusion to coherently merge local and global cues, while reducing self-attention complexity from $O(B \cdot head \cdot (H \cdot W)^2)$ via channel-splitting and using $hidden = r_{expan} \cdot C$ for the gate branch. A Spatial-Frequency Fusion algorithm fuses information in both spatial and frequency domains, controlled by a learnable weight $\lambda$, to yield a robust restoration feature $F_{out}^{fuse}$. Empirically, AnyIR achieves state-of-the-art performance in all-in-one restoration while dramatically cutting parameters and FLOPs, enabling practical deployment on mobile and edge devices, and demonstrating robust generalization across unseen degradations and mixed degradation scenarios. This work provides a strong, efficient baseline for all-in-one image restoration and offers insights into degradation-aware embedding and multi-domain fusion for low-level vision tasks.

Abstract

Restoring any degraded image efficiently via just one model has become increasingly significant and impactful, especially with the proliferation of mobile devices. Traditional solutions typically involve training dedicated models per degradation, resulting in inefficiency and redundancy. More recent approaches either introduce additional modules to learn visual prompts, significantly increasing model size, or incorporate cross-modal transfer from large language models trained on vast datasets, adding complexity to the system architecture. In contrast, our approach, termed AnyIR, takes a unified path that leverages inherent similarity across various degradations to enable both efficient and comprehensive restoration through a joint embedding mechanism, without scaling up the model or relying on large language models.Specifically, we examine the sub-latent space of each input, identifying key components and reweighting them first in a gated manner. To fuse the intrinsic degradation awareness and the contextualized attention, a spatial-frequency parallel fusion strategy is proposed for enhancing spatial-aware local-global interactions and enriching the restoration details from the frequency perspective. Extensive benchmarking in the all-in-one restoration setting confirms AnyIR's SOTA performance, reducing model complexity by around 82\% in parameters and 85\% in FLOPs. Our code will be available at our Project page (https://amazingren.github.io/AnyIR/)

Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation

TL;DR

AnyIR tackles the problem of restoring images degraded by diverse factors with a single, efficient model suitable for edge devices. It introduces a four-level U-Net equipped with Degradation Adaptation Blocks that split features into an attention pathway and a gated pathway, with a temperature-guided gating mechanism and a sub-latent channel split to reduce computation; the design includes a spatial-frequency fusion to coherently merge local and global cues, while reducing self-attention complexity from via channel-splitting and using for the gate branch. A Spatial-Frequency Fusion algorithm fuses information in both spatial and frequency domains, controlled by a learnable weight , to yield a robust restoration feature . Empirically, AnyIR achieves state-of-the-art performance in all-in-one restoration while dramatically cutting parameters and FLOPs, enabling practical deployment on mobile and edge devices, and demonstrating robust generalization across unseen degradations and mixed degradation scenarios. This work provides a strong, efficient baseline for all-in-one image restoration and offers insights into degradation-aware embedding and multi-domain fusion for low-level vision tasks.

Abstract

Restoring any degraded image efficiently via just one model has become increasingly significant and impactful, especially with the proliferation of mobile devices. Traditional solutions typically involve training dedicated models per degradation, resulting in inefficiency and redundancy. More recent approaches either introduce additional modules to learn visual prompts, significantly increasing model size, or incorporate cross-modal transfer from large language models trained on vast datasets, adding complexity to the system architecture. In contrast, our approach, termed AnyIR, takes a unified path that leverages inherent similarity across various degradations to enable both efficient and comprehensive restoration through a joint embedding mechanism, without scaling up the model or relying on large language models.Specifically, we examine the sub-latent space of each input, identifying key components and reweighting them first in a gated manner. To fuse the intrinsic degradation awareness and the contextualized attention, a spatial-frequency parallel fusion strategy is proposed for enhancing spatial-aware local-global interactions and enriching the restoration details from the frequency perspective. Extensive benchmarking in the all-in-one restoration setting confirms AnyIR's SOTA performance, reducing model complexity by around 82\% in parameters and 85\% in FLOPs. Our code will be available at our Project page (https://amazingren.github.io/AnyIR/)

Paper Structure

This paper contains 14 sections, 4 equations, 8 figures, 8 tables, 2 algorithms.

Figures (8)

  • Figure 1: (a) The framework of the proposed AnyIR : i.e., a convolutional patch embedding, a U-shape encoder-decoder main body, and an extra refined block. (b) Structure of degradation adaptation block (DAB)
  • Figure 2: Structure of our GatedDA. $\oplus$, $\textcircled{c}$, $\textcircled{g}$, and $\otimes$ denote the element-wise addition, channel-wise concatenation, GELU hendrycks2016gaussian activation, and element-wise multiplication, respectively.
  • Figure 3: Visual comparison of AnyIR with state-of-the-art methods considering three degradations. Zoom in for a better view.
  • Figure 4: Structure of other different DAB variants.
  • Figure 5: Visual feature maps of $\alpha$, $\beta$, and $\gamma$ within GatedDA.
  • ...and 3 more figures