Table of Contents
Fetching ...

Transformer-Progressive Mamba Network for Lightweight Image Super-Resolution

Sichen Guo, Wenjie Li, Yuanyang Liu, Guangwei Gao, Jian Yang, Chia-Wen Lin

TL;DR

This work tackles efficient single-image super-resolution by marrying Transformer-style local attention with Mamba-inspired state-space modeling in a progressive, cross-scale framework. The proposed Transformer-Progressive Mamba (T-PMambaSR) uses Window Scan Mamba Layers for regional modeling and Global Scan Mamba Layers for global context, connected through a progressive receptive-field expansion that iteratively refines features. A key contribution is the Adaptive High-Frequency Refinement Module (AHFRM), which preserves and restores high-frequency details degraded by global processing, boosting texture sharpness without sacrificing efficiency. The combination yields state-of-the-art results on synthetic and real-world SR benchmarks while maintaining lower parameters and FLOPs than comparable Transformer- or Mamba-based methods, demonstrating the practicality of a smooth local-to-global design for lightweight SR.

Abstract

Recently, Mamba-based super-resolution (SR) methods have demonstrated the ability to capture global receptive fields with linear complexity, addressing the quadratic computational cost of Transformer-based SR approaches. However, existing Mamba-based methods lack fine-grained transitions across different modeling scales, which limits the efficiency of feature representation. In this paper, we propose T-PMambaSR, a lightweight SR framework that integrates window-based self-attention with Progressive Mamba. By enabling interactions among receptive fields of different scales, our method establishes a fine-grained modeling paradigm that progressively enhances feature representation with linear complexity. Furthermore, we introduce an Adaptive High-Frequency Refinement Module (AHFRM) to recover high-frequency details lost during Transformer and Mamba processing. Extensive experiments demonstrate that T-PMambaSR progressively enhances the model's receptive field and expressiveness, yielding better performance than recent Transformer- or Mamba-based methods while incurring lower computational cost. Our codes will be released after acceptance.

Transformer-Progressive Mamba Network for Lightweight Image Super-Resolution

TL;DR

This work tackles efficient single-image super-resolution by marrying Transformer-style local attention with Mamba-inspired state-space modeling in a progressive, cross-scale framework. The proposed Transformer-Progressive Mamba (T-PMambaSR) uses Window Scan Mamba Layers for regional modeling and Global Scan Mamba Layers for global context, connected through a progressive receptive-field expansion that iteratively refines features. A key contribution is the Adaptive High-Frequency Refinement Module (AHFRM), which preserves and restores high-frequency details degraded by global processing, boosting texture sharpness without sacrificing efficiency. The combination yields state-of-the-art results on synthetic and real-world SR benchmarks while maintaining lower parameters and FLOPs than comparable Transformer- or Mamba-based methods, demonstrating the practicality of a smooth local-to-global design for lightweight SR.

Abstract

Recently, Mamba-based super-resolution (SR) methods have demonstrated the ability to capture global receptive fields with linear complexity, addressing the quadratic computational cost of Transformer-based SR approaches. However, existing Mamba-based methods lack fine-grained transitions across different modeling scales, which limits the efficiency of feature representation. In this paper, we propose T-PMambaSR, a lightweight SR framework that integrates window-based self-attention with Progressive Mamba. By enabling interactions among receptive fields of different scales, our method establishes a fine-grained modeling paradigm that progressively enhances feature representation with linear complexity. Furthermore, we introduce an Adaptive High-Frequency Refinement Module (AHFRM) to recover high-frequency details lost during Transformer and Mamba processing. Extensive experiments demonstrate that T-PMambaSR progressively enhances the model's receptive field and expressiveness, yielding better performance than recent Transformer- or Mamba-based methods while incurring lower computational cost. Our codes will be released after acceptance.

Paper Structure

This paper contains 27 sections, 9 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: (Top): Our design rationale is based on progressively exploiting internal interactions within Window multi-head self-attention (MHSA), combined with a Window Scan Mamba (WSM) and a Global Scan Mamba (GSM). This hierarchical structure facilitates the gradual expansion of receptive fields, ensuring comprehensive information exchange both within and across windows. (Bottom)Leveraging our design, our method strikes an optimal balance across Params, FLOPs, and PSNR, surpassing existing Transformer- and Mamba-based methods.
  • Figure 2: The network architecture of our T-PMambaSR, as well as the framework of the (a) Window Scan Mamba Layer (WSML), (b) Transformer Layer (TL), and (c) Global Scan Mamba Layer (GSML).
  • Figure 3: The illustration of our (a) Window Interaction State Space Module (WISSM) with its two flatten mechanisms, Window Interaction / Fusion Flatten (WIF and WFF), and (b) Multi-head Global State Space Module (MGSSM).
  • Figure 4: The architecture of our (a) Adaptive High-Frequency Refinement Module (AHFRM), (b) Multi-Scale Gating Module (MSGM), (c) High-frequency Filtering Module (HFM), and (d) High-Frequency Channel Alignment (HFCA).
  • Figure 5: Qualitative comparisons with existing methods in different scenes. Our method can restore clearer edges and structures.
  • ...and 5 more figures