Table of Contents
Fetching ...

Uformer: A General U-Shaped Transformer for Image Restoration

Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, Houqiang Li

TL;DR

Uformer presents a general, efficient Transformer-based architecture for image restoration by integrating LeWin blocks and a lightweight multi-scale restoration modulator into a U-shaped encoder-decoder with skip connections. The LeWin block uses non-overlapping window self-attention plus a locally-enhanced FFN to balance global dependency modeling with local detail, while the modulator provides scale-aware feature calibration across decoder stages. Through comprehensive experiments on denoising, motion blur, defocus blur, and deraining, Uformer achieves state-of-the-art or competitive results with lower computational cost than many CNN-based counterparts. The work demonstrates a versatile Transformer approach for diverse low-level vision tasks and highlights design choices that enable strong restoration performance with efficiency.

Abstract

In this paper, we present Uformer, an effective and efficient Transformer-based architecture for image restoration, in which we build a hierarchical encoder-decoder network using the Transformer block. In Uformer, there are two core designs. First, we introduce a novel locally-enhanced window (LeWin) Transformer block, which performs nonoverlapping window-based self-attention instead of global self-attention. It significantly reduces the computational complexity on high resolution feature map while capturing local context. Second, we propose a learnable multi-scale restoration modulator in the form of a multi-scale spatial bias to adjust features in multiple layers of the Uformer decoder. Our modulator demonstrates superior capability for restoring details for various image restoration tasks while introducing marginal extra parameters and computational cost. Powered by these two designs, Uformer enjoys a high capability for capturing both local and global dependencies for image restoration. To evaluate our approach, extensive experiments are conducted on several image restoration tasks, including image denoising, motion deblurring, defocus deblurring and deraining. Without bells and whistles, our Uformer achieves superior or comparable performance compared with the state-of-the-art algorithms. The code and models are available at https://github.com/ZhendongWang6/Uformer.

Uformer: A General U-Shaped Transformer for Image Restoration

TL;DR

Uformer presents a general, efficient Transformer-based architecture for image restoration by integrating LeWin blocks and a lightweight multi-scale restoration modulator into a U-shaped encoder-decoder with skip connections. The LeWin block uses non-overlapping window self-attention plus a locally-enhanced FFN to balance global dependency modeling with local detail, while the modulator provides scale-aware feature calibration across decoder stages. Through comprehensive experiments on denoising, motion blur, defocus blur, and deraining, Uformer achieves state-of-the-art or competitive results with lower computational cost than many CNN-based counterparts. The work demonstrates a versatile Transformer approach for diverse low-level vision tasks and highlights design choices that enable strong restoration performance with efficiency.

Abstract

In this paper, we present Uformer, an effective and efficient Transformer-based architecture for image restoration, in which we build a hierarchical encoder-decoder network using the Transformer block. In Uformer, there are two core designs. First, we introduce a novel locally-enhanced window (LeWin) Transformer block, which performs nonoverlapping window-based self-attention instead of global self-attention. It significantly reduces the computational complexity on high resolution feature map while capturing local context. Second, we propose a learnable multi-scale restoration modulator in the form of a multi-scale spatial bias to adjust features in multiple layers of the Uformer decoder. Our modulator demonstrates superior capability for restoring details for various image restoration tasks while introducing marginal extra parameters and computational cost. Powered by these two designs, Uformer enjoys a high capability for capturing both local and global dependencies for image restoration. To evaluate our approach, extensive experiments are conducted on several image restoration tasks, including image denoising, motion deblurring, defocus deblurring and deraining. Without bells and whistles, our Uformer achieves superior or comparable performance compared with the state-of-the-art algorithms. The code and models are available at https://github.com/ZhendongWang6/Uformer.

Paper Structure

This paper contains 20 sections, 4 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: PSNR vs. computational cost on the SIDD dataset SIDD.
  • Figure 2: (a) Overview of the Uformer structure. (b) LeWin Transformer block. (c) Illustration of how the modulators modulate the W-MSAs in each LeWin Transformer block which is named MW-MSA in (b).
  • Figure 3: Locally-enhanced feed-forward network.
  • Figure 4: Effect of the multi-scale restoration modulator on image deblurring (top samples from GoPro GoPro) and denoising (bottom samples from SIDD SIDD). Compared with (a), Uformer w/ Modulator (b) can remove much more blur and recover the numbers accurately. Compared with (d), the image restored by Uformer w/ Modulator (e) is closer to the target with more details.
  • Figure 5: Visual comparisons with state-of-the-art methods on real noise removal. The top sample comes from SIDD while the bottom one is from DND.
  • ...and 8 more figures