Transformer-Progressive Mamba Network for Lightweight Image Super-Resolution
Sichen Guo, Wenjie Li, Yuanyang Liu, Guangwei Gao, Jian Yang, Chia-Wen Lin
TL;DR
This work tackles efficient single-image super-resolution by marrying Transformer-style local attention with Mamba-inspired state-space modeling in a progressive, cross-scale framework. The proposed Transformer-Progressive Mamba (T-PMambaSR) uses Window Scan Mamba Layers for regional modeling and Global Scan Mamba Layers for global context, connected through a progressive receptive-field expansion that iteratively refines features. A key contribution is the Adaptive High-Frequency Refinement Module (AHFRM), which preserves and restores high-frequency details degraded by global processing, boosting texture sharpness without sacrificing efficiency. The combination yields state-of-the-art results on synthetic and real-world SR benchmarks while maintaining lower parameters and FLOPs than comparable Transformer- or Mamba-based methods, demonstrating the practicality of a smooth local-to-global design for lightweight SR.
Abstract
Recently, Mamba-based super-resolution (SR) methods have demonstrated the ability to capture global receptive fields with linear complexity, addressing the quadratic computational cost of Transformer-based SR approaches. However, existing Mamba-based methods lack fine-grained transitions across different modeling scales, which limits the efficiency of feature representation. In this paper, we propose T-PMambaSR, a lightweight SR framework that integrates window-based self-attention with Progressive Mamba. By enabling interactions among receptive fields of different scales, our method establishes a fine-grained modeling paradigm that progressively enhances feature representation with linear complexity. Furthermore, we introduce an Adaptive High-Frequency Refinement Module (AHFRM) to recover high-frequency details lost during Transformer and Mamba processing. Extensive experiments demonstrate that T-PMambaSR progressively enhances the model's receptive field and expressiveness, yielding better performance than recent Transformer- or Mamba-based methods while incurring lower computational cost. Our codes will be released after acceptance.
