Table of Contents
Fetching ...

Multi-scale Attention Network for Single Image Super-Resolution

Yan Wang, Yusen Li, Gang Wang, Xiaoguang Liu

TL;DR

This work tackles single image super-resolution by addressing the limited receptive field and computational cost of existing CNN and transformer approaches. It introduces MAN, a CNN-based framework that uses multi-scale large kernel attention (MLKA) and a simplified gated spatial attention unit (GSAU) within a MetaFormer-style multi-attention block (MAB) to capture both global and local information efficiently. Key contributions include MLKA’s combination of large-kernel decomposition, multi-scale grouping, and gating to mitigate blocking artifacts, and GSAU as a lightweight FFN alternative that preserves spatial context with reduced complexity. Across multiple model sizes, MAN achieves state-of-the-art-like performance with favorable efficiency, and attains competitive results compared to SR transformers, validating the potential of CNN-based SR with carefully designed attention mechanisms.

Abstract

ConvNets can compete with transformers in high-level tasks by exploiting larger receptive fields. To unleash the potential of ConvNet in super-resolution, we propose a multi-scale attention network (MAN), by coupling classical multi-scale mechanism with emerging large kernel attention. In particular, we proposed multi-scale large kernel attention (MLKA) and gated spatial attention unit (GSAU). Through our MLKA, we modify large kernel attention with multi-scale and gate schemes to obtain the abundant attention map at various granularity levels, thereby aggregating global and local information and avoiding potential blocking artifacts. In GSAU, we integrate gate mechanism and spatial attention to remove the unnecessary linear layer and aggregate informative spatial context. To confirm the effectiveness of our designs, we evaluate MAN with multiple complexities by simply stacking different numbers of MLKA and GSAU. Experimental results illustrate that our MAN can perform on par with SwinIR and achieve varied trade-offs between state-of-the-art performance and computations.

Multi-scale Attention Network for Single Image Super-Resolution

TL;DR

This work tackles single image super-resolution by addressing the limited receptive field and computational cost of existing CNN and transformer approaches. It introduces MAN, a CNN-based framework that uses multi-scale large kernel attention (MLKA) and a simplified gated spatial attention unit (GSAU) within a MetaFormer-style multi-attention block (MAB) to capture both global and local information efficiently. Key contributions include MLKA’s combination of large-kernel decomposition, multi-scale grouping, and gating to mitigate blocking artifacts, and GSAU as a lightweight FFN alternative that preserves spatial context with reduced complexity. Across multiple model sizes, MAN achieves state-of-the-art-like performance with favorable efficiency, and attains competitive results compared to SR transformers, validating the potential of CNN-based SR with carefully designed attention mechanisms.

Abstract

ConvNets can compete with transformers in high-level tasks by exploiting larger receptive fields. To unleash the potential of ConvNet in super-resolution, we propose a multi-scale attention network (MAN), by coupling classical multi-scale mechanism with emerging large kernel attention. In particular, we proposed multi-scale large kernel attention (MLKA) and gated spatial attention unit (GSAU). Through our MLKA, we modify large kernel attention with multi-scale and gate schemes to obtain the abundant attention map at various granularity levels, thereby aggregating global and local information and avoiding potential blocking artifacts. In GSAU, we integrate gate mechanism and spatial attention to remove the unnecessary linear layer and aggregate informative spatial context. To confirm the effectiveness of our designs, we evaluate MAN with multiple complexities by simply stacking different numbers of MLKA and GSAU. Experimental results illustrate that our MAN can perform on par with SwinIR and achieve varied trade-offs between state-of-the-art performance and computations.
Paper Structure (18 sections, 8 equations, 7 figures, 8 tables)

This paper contains 18 sections, 8 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Trade-off between performance and model complexity on Manga109 Manga109 with $\times$2 SR scale. MANs can achieve higher PSNR with fewer parameters and computations.
  • Figure 2: Overview of our multi-scale attention network (MAN).
  • Figure 3: Details of proposed modules.
  • Figure 4: Visual activation maps of \ref{['eq:GSAU']} in the 16-th layer of MAN-light. From top to bottom are the corresponding feature maps of 3-5-1, 5-7-1, and 7-9-1, respectively.
  • Figure 5: Comparisons between LKA and MLKA. Rows 1: visual comparisons. Row 2: Cols 2-4: The difference maps of the area of interest between LKA and MLKA. The red regions are noticed by almost both LKA and MLKA while the blue represent additional interest areas of MLKA. Col 5: LAM results of MLKA.
  • ...and 2 more figures