Multi-scale Attention Network for Single Image Super-Resolution
Yan Wang, Yusen Li, Gang Wang, Xiaoguang Liu
TL;DR
This work tackles single image super-resolution by addressing the limited receptive field and computational cost of existing CNN and transformer approaches. It introduces MAN, a CNN-based framework that uses multi-scale large kernel attention (MLKA) and a simplified gated spatial attention unit (GSAU) within a MetaFormer-style multi-attention block (MAB) to capture both global and local information efficiently. Key contributions include MLKA’s combination of large-kernel decomposition, multi-scale grouping, and gating to mitigate blocking artifacts, and GSAU as a lightweight FFN alternative that preserves spatial context with reduced complexity. Across multiple model sizes, MAN achieves state-of-the-art-like performance with favorable efficiency, and attains competitive results compared to SR transformers, validating the potential of CNN-based SR with carefully designed attention mechanisms.
Abstract
ConvNets can compete with transformers in high-level tasks by exploiting larger receptive fields. To unleash the potential of ConvNet in super-resolution, we propose a multi-scale attention network (MAN), by coupling classical multi-scale mechanism with emerging large kernel attention. In particular, we proposed multi-scale large kernel attention (MLKA) and gated spatial attention unit (GSAU). Through our MLKA, we modify large kernel attention with multi-scale and gate schemes to obtain the abundant attention map at various granularity levels, thereby aggregating global and local information and avoiding potential blocking artifacts. In GSAU, we integrate gate mechanism and spatial attention to remove the unnecessary linear layer and aggregate informative spatial context. To confirm the effectiveness of our designs, we evaluate MAN with multiple complexities by simply stacking different numbers of MLKA and GSAU. Experimental results illustrate that our MAN can perform on par with SwinIR and achieve varied trade-offs between state-of-the-art performance and computations.
