Table of Contents
Fetching ...

MambaIRv2: Attentive State Space Restoration

Hang Guo, Yong Guo, Yaohua Zha, Yulun Zhang, Wenbo Li, Tao Dai, Shu-Tao Xia, Yawei Li

TL;DR

MambaIRv2 tackles the inherent causality of Mamba-based image restoration by introducing an Attentive State-space Equation (ASE) and Semantic Guided Neighboring (SGN) to enable ViT-like non-causal information flow within a state-space framework. ASE injects semantically similar pixel prompts into the state-space output to query the entire image, while SGN reshapes the 1D sequence so semantically related pixels are proximal, mitigating long-range decay. The resulting Attentive State Space Restoration backbone demonstrates superior performance and efficiency across lightweight and classic SR, JPEG CAR, and denoising tasks, outperforming strong Transformer-based baselines with fewer parameters and lower compute. This approach provides a principled, single-pass alternative to multi-directional scanning, with evidence of broader receptive fields and improved restoration quality. The work suggests a promising direction for integrating ViT-like attention into Mamba for high-quality, efficient low-level vision tasks.

Abstract

The Mamba-based image restoration backbones have recently demonstrated significant potential in balancing global reception and computational efficiency. However, the inherent causal modeling limitation of Mamba, where each token depends solely on its predecessors in the scanned sequence, restricts the full utilization of pixels across the image and thus presents new challenges in image restoration. In this work, we propose MambaIRv2, which equips Mamba with the non-causal modeling ability similar to ViTs to reach the attentive state space restoration model. Specifically, the proposed attentive state-space equation allows to attend beyond the scanned sequence and facilitate image unfolding with just one single scan. Moreover, we further introduce a semantic-guided neighboring mechanism to encourage interaction between distant but similar pixels. Extensive experiments show our MambaIRv2 outperforms SRFormer by even 0.35dB PSNR for lightweight SR even with 9.3\% less parameters and suppresses HAT on classic SR by up to 0.29dB. Code is available at https://github.com/csguoh/MambaIR.

MambaIRv2: Attentive State Space Restoration

TL;DR

MambaIRv2 tackles the inherent causality of Mamba-based image restoration by introducing an Attentive State-space Equation (ASE) and Semantic Guided Neighboring (SGN) to enable ViT-like non-causal information flow within a state-space framework. ASE injects semantically similar pixel prompts into the state-space output to query the entire image, while SGN reshapes the 1D sequence so semantically related pixels are proximal, mitigating long-range decay. The resulting Attentive State Space Restoration backbone demonstrates superior performance and efficiency across lightweight and classic SR, JPEG CAR, and denoising tasks, outperforming strong Transformer-based baselines with fewer parameters and lower compute. This approach provides a principled, single-pass alternative to multi-directional scanning, with evidence of broader receptive fields and improved restoration quality. The work suggests a promising direction for integrating ViT-like attention into Mamba for high-quality, efficient low-level vision tasks.

Abstract

The Mamba-based image restoration backbones have recently demonstrated significant potential in balancing global reception and computational efficiency. However, the inherent causal modeling limitation of Mamba, where each token depends solely on its predecessors in the scanned sequence, restricts the full utilization of pixels across the image and thus presents new challenges in image restoration. In this work, we propose MambaIRv2, which equips Mamba with the non-causal modeling ability similar to ViTs to reach the attentive state space restoration model. Specifically, the proposed attentive state-space equation allows to attend beyond the scanned sequence and facilitate image unfolding with just one single scan. Moreover, we further introduce a semantic-guided neighboring mechanism to encourage interaction between distant but similar pixels. Extensive experiments show our MambaIRv2 outperforms SRFormer by even 0.35dB PSNR for lightweight SR even with 9.3\% less parameters and suppresses HAT on classic SR by up to 0.29dB. Code is available at https://github.com/csguoh/MambaIR.

Paper Structure

This paper contains 24 sections, 12 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: (a) The existing method guo2024mambair suffers from the adverse effects of the causal nature of Mamba (the multi-directional scans are not shown for presentation clarity). (b) The proposed MambaIRv2 can achieve attentive state-space modeling that embeds ViT-like non-causal properties into Mamba.
  • Figure 2: (a) We compute the cosine similarity of scanned features across all 4 directions and all layers in MambaIR guo2024mambair. (b) The kernel density estimation of the distribution of the control matrix in MambaIR guo2024mambair.
  • Figure 3: The overall architecture of our proposed MambaIRv2, as well as the (a) Attentive State Space Module (ASSM), (b) Attentive State-space Equition (ASE), and (c) Semantic Guided Neighboring (SGN).
  • Figure 4: Qualitative comparison of our MambaIRv2 with different methods on $4\times$ classic image SR.
  • Figure 5: The visualization of the attentive state space. We compute the cosine similarity between the prompt corresponding to the query pixel and the matrix $\mathbf{C}$. We filter out low-similarity points for presentation clarity. More examples are provided in the Suppl..
  • ...and 4 more figures