Table of Contents
Fetching ...

Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration

Shihao Zhou, Duosheng Chen, Jinshan Pan, Jufeng Yang

TL;DR

A window-wise injection module (WIM), which incorporates abundant high-frequency details into the feature map, to provide reliable references for restoring high-quality images, and a bidirectional interaction module (BIM) to aggregate features at different scales using a mutually reinforced paradigm, resulting in spatially and contextually improved representations.

Abstract

Transformer-based approaches have achieved superior performance in image restoration, since they can model long-term dependencies well. However, the limitation in capturing local information restricts their capacity to remove degradations. While existing approaches attempt to mitigate this issue by incorporating convolutional operations, the core component in Transformer, i.e., self-attention, which serves as a low-pass filter, could unintentionally dilute or even eliminate the acquired local patterns. In this paper, we propose HIT, a simple yet effective High-frequency Injected Transformer for image restoration. Specifically, we design a window-wise injection module (WIM), which incorporates abundant high-frequency details into the feature map, to provide reliable references for restoring high-quality images. We also develop a bidirectional interaction module (BIM) to aggregate features at different scales using a mutually reinforced paradigm, resulting in spatially and contextually improved representations. In addition, we introduce a spatial enhancement unit (SEU) to preserve essential spatial relationships that may be lost due to the computations carried out across channel dimensions in the BIM. Extensive experiments on 9 tasks (real noise, real rain streak, raindrop, motion blur, moiré, shadow, snow, haze, and low-light condition) demonstrate that HIT with linear computational complexity performs favorably against the state-of-the-art methods. The source code and pre-trained models will be available at https://github.com/joshyZhou/HIT.

Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration

TL;DR

A window-wise injection module (WIM), which incorporates abundant high-frequency details into the feature map, to provide reliable references for restoring high-quality images, and a bidirectional interaction module (BIM) to aggregate features at different scales using a mutually reinforced paradigm, resulting in spatially and contextually improved representations.

Abstract

Transformer-based approaches have achieved superior performance in image restoration, since they can model long-term dependencies well. However, the limitation in capturing local information restricts their capacity to remove degradations. While existing approaches attempt to mitigate this issue by incorporating convolutional operations, the core component in Transformer, i.e., self-attention, which serves as a low-pass filter, could unintentionally dilute or even eliminate the acquired local patterns. In this paper, we propose HIT, a simple yet effective High-frequency Injected Transformer for image restoration. Specifically, we design a window-wise injection module (WIM), which incorporates abundant high-frequency details into the feature map, to provide reliable references for restoring high-quality images. We also develop a bidirectional interaction module (BIM) to aggregate features at different scales using a mutually reinforced paradigm, resulting in spatially and contextually improved representations. In addition, we introduce a spatial enhancement unit (SEU) to preserve essential spatial relationships that may be lost due to the computations carried out across channel dimensions in the BIM. Extensive experiments on 9 tasks (real noise, real rain streak, raindrop, motion blur, moiré, shadow, snow, haze, and low-light condition) demonstrate that HIT with linear computational complexity performs favorably against the state-of-the-art methods. The source code and pre-trained models will be available at https://github.com/joshyZhou/HIT.
Paper Structure (16 sections, 8 equations, 7 figures, 8 tables)

This paper contains 16 sections, 8 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Image deblurring on the RWBI zhang2020deblurring dataset. Compared with the state-of-the-art approaches ((b) and (c)), the proposed HIT can generate clearer images as shown in (d). Moreover, (e) and (f) denote the attribution maps of Uformer and HIT by using Integrated Gradients (IG) IG_17ICML, where the pixel is activated when it contributes to the restoration result.
  • Figure 2: Overview of HIT. It consists of a U-shaped architecture and two modules: (a) Window-wise Injection Module (WIM) that fuses local cues in separate windows of the feature map. (b) Bidirectional Interaction Module (BIM) that aggregates features at different scales to achieve spatially and semantically improved representations. T-Block is short for Transformer Block. W-MSA and FFN represent window-based multi-head self-attention liu2021swin and Feed-Forward Networkli2021localvit.
  • Figure 3: Spatial Enhancement Unit. V stands for the Value projection in Self-Attention and DWConv is a depth-wise convolution. $\large{\copyright}$ denotes the concatenation operation and $\circledS$ is the softmax activation.
  • Figure 4: Qualitative comparisons with SOTA methods on SIDD ssid_2018 for denoising.
  • Figure 5: Qualitative comparisons with SOTA methods on SPAD wang2019spatial for deraining.
  • ...and 2 more figures