Uformer: A General U-Shaped Transformer for Image Restoration
Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, Houqiang Li
TL;DR
Uformer presents a general, efficient Transformer-based architecture for image restoration by integrating LeWin blocks and a lightweight multi-scale restoration modulator into a U-shaped encoder-decoder with skip connections. The LeWin block uses non-overlapping window self-attention plus a locally-enhanced FFN to balance global dependency modeling with local detail, while the modulator provides scale-aware feature calibration across decoder stages. Through comprehensive experiments on denoising, motion blur, defocus blur, and deraining, Uformer achieves state-of-the-art or competitive results with lower computational cost than many CNN-based counterparts. The work demonstrates a versatile Transformer approach for diverse low-level vision tasks and highlights design choices that enable strong restoration performance with efficiency.
Abstract
In this paper, we present Uformer, an effective and efficient Transformer-based architecture for image restoration, in which we build a hierarchical encoder-decoder network using the Transformer block. In Uformer, there are two core designs. First, we introduce a novel locally-enhanced window (LeWin) Transformer block, which performs nonoverlapping window-based self-attention instead of global self-attention. It significantly reduces the computational complexity on high resolution feature map while capturing local context. Second, we propose a learnable multi-scale restoration modulator in the form of a multi-scale spatial bias to adjust features in multiple layers of the Uformer decoder. Our modulator demonstrates superior capability for restoring details for various image restoration tasks while introducing marginal extra parameters and computational cost. Powered by these two designs, Uformer enjoys a high capability for capturing both local and global dependencies for image restoration. To evaluate our approach, extensive experiments are conducted on several image restoration tasks, including image denoising, motion deblurring, defocus deblurring and deraining. Without bells and whistles, our Uformer achieves superior or comparable performance compared with the state-of-the-art algorithms. The code and models are available at https://github.com/ZhendongWang6/Uformer.
