Table of Contents
Fetching ...

Learning Dual Transformers for All-In-One Image Restoration from a Frequency Perspective

Jie Chu, Tong Su, Pei Liu, Yunpeng Wu, Le Zhang, Zenglin Shi, Meng Wang

TL;DR

This study tackles all-in-one image restoration by introducing a frequency-aware dual-transformer framework. The Degradation Estimation Transformer (Dformer) learns degradation priors from frequency-decomposed inputs, while the Degradation-Adaptive Restoration Transformer (Rformer) applies these priors through a degradation-aware self-attention mechanism. The model demonstrates superior performance across five restoration tasks, with strong generalization to real-world, spatially variant, and unseen degradations. By explicitly modeling how degradations distribute across frequency bands, the approach achieves robust, unified restoration capabilities with practical impact for diverse imaging scenarios.

Abstract

This work aims to tackle the all-in-one image restoration task, which seeks to handle multiple types of degradation with a single model. The primary challenge is to extract degradation representations from the input degraded images and use them to guide the model's adaptation to specific degradation types. Building on the insight that various degradations affect image content differently across frequency bands, we propose a new dual-transformer approach comprising two components: a frequency-aware Degradation estimation transformer (Dformer) and a degradation-adaptive Restoration transformer (Rformer). The Dformer captures the essential characteristics of various degradations by decomposing the input into different frequency components. By understanding how degradations affect these frequency components, the Dformer learns robust priors that effectively guide the restoration process. The Rformer then employs a degradation-adaptive self-attention module to selectively focus on the most affected frequency components, guided by the learned degradation representations. Extensive experimental results demonstrate that our approach outperforms existing methods in five representative restoration tasks, including denoising, deraining, dehazing, deblurring, and low-light enhancement. Additionally, our method offers benefits for handling, real-world degradations, spatially variant degradations, and unseen degradation levels.

Learning Dual Transformers for All-In-One Image Restoration from a Frequency Perspective

TL;DR

This study tackles all-in-one image restoration by introducing a frequency-aware dual-transformer framework. The Degradation Estimation Transformer (Dformer) learns degradation priors from frequency-decomposed inputs, while the Degradation-Adaptive Restoration Transformer (Rformer) applies these priors through a degradation-aware self-attention mechanism. The model demonstrates superior performance across five restoration tasks, with strong generalization to real-world, spatially variant, and unseen degradations. By explicitly modeling how degradations distribute across frequency bands, the approach achieves robust, unified restoration capabilities with practical impact for diverse imaging scenarios.

Abstract

This work aims to tackle the all-in-one image restoration task, which seeks to handle multiple types of degradation with a single model. The primary challenge is to extract degradation representations from the input degraded images and use them to guide the model's adaptation to specific degradation types. Building on the insight that various degradations affect image content differently across frequency bands, we propose a new dual-transformer approach comprising two components: a frequency-aware Degradation estimation transformer (Dformer) and a degradation-adaptive Restoration transformer (Rformer). The Dformer captures the essential characteristics of various degradations by decomposing the input into different frequency components. By understanding how degradations affect these frequency components, the Dformer learns robust priors that effectively guide the restoration process. The Rformer then employs a degradation-adaptive self-attention module to selectively focus on the most affected frequency components, guided by the learned degradation representations. Extensive experimental results demonstrate that our approach outperforms existing methods in five representative restoration tasks, including denoising, deraining, dehazing, deblurring, and low-light enhancement. Additionally, our method offers benefits for handling, real-world degradations, spatially variant degradations, and unseen degradation levels.
Paper Structure (19 sections, 6 equations, 9 figures, 11 tables)

This paper contains 19 sections, 6 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Frequency analysis of various degradation types. For each degradation type, the images are the discrete normalized spectrogram (left), the degraded image (right top), and the clean image (right bottom). Noisy and rainy images exhibit a reduced proportion of low-frequency components and an increased proportion of high-frequency components compared to their corresponding clean images. In contrast, hazy and blurry images show the opposite trend.
  • Figure 2: Visualizing the ratios of low-frequency to high-frequency components between the clean and degraded images using 150 samples. The noisy and rainy samples are positioned in the upper-left region, indicating that the degraded images contain more high-frequency components and fewer low-frequency components than their clean counterparts. In contrast, the hazy and blurred samples are found in the lower-right region, reflecting the opposite trend.
  • Figure 3: Overview of the proposed methods. Dformer learns degradation representation and guides Rformer to achieve all-in-one restoration. Input Frequency Decomposition module utilizes DFT and IDFT processes to decompose the input image into multiple frequency-band images. Input Projection module employs a convolution layer to project the input image into the feature maps. Frequency-Aware Transformer Block (FA-TB) is detailed in (b). The image showing a down-arrow within a circle denotes the downsampling layer. Output Projection module includes 2D average pooling and two-layer MLP to refine and project degradation representation. Degradation Projection includes a two-layer MLP. The architecture of Rformer follows Uformer Wang_2022_CVPR, but employs a new degradation-adaptive self-attention mechanism as detailed in (c).
  • Figure 4: The performance of various methods on denoising ($\sigma=25$) tasks. From the regions masked by the blue rectangles, we observe our method better preserves edge details (best viewed digitally).
  • Figure 5: The performance of various methods on deraining tasks. From the regions masked by the blue rectangles, we observe our method performs well, especially when recovering high-frequency details (best viewed digitally).
  • ...and 4 more figures