Table of Contents
Fetching ...

MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration

Boyun Li, Haiyu Zhao, Wenxin Wang, Peng Hu, Yuanbiao Gou, Xi Peng

TL;DR

MaIR tackles the loss of locality and continuity in Mamba-based image restoration by introducing Nested S-shaped Scanning (NSS) and Sequence Shuffle Attention (SSA). The method preserves 2D image structure and enables effective fusion of diverse 1D sequences, yielding state-of-the-art results across image super-resolution, denoising, deblurring, and dehazing on 14 benchmarks. It achieves these gains with a cost-free approach that avoids extra computational overhead, and demonstrates robustness across stripe widths and tasks. The work advances practical restoration performance and broadens the applicability of Mamba-based models to high-quality 2D image recovery tasks.

Abstract

Recent advancements in Mamba have shown promising results in image restoration. These methods typically flatten 2D images into multiple distinct 1D sequences along rows and columns, process each sequence independently using selective scan operation, and recombine them to form the outputs. However, such a paradigm overlooks two vital aspects: i) the local relationships and spatial continuity inherent in natural images, and ii) the discrepancies among sequences unfolded through totally different ways. To overcome the drawbacks, we explore two problems in Mamba-based restoration methods: i) how to design a scanning strategy preserving both locality and continuity while facilitating restoration, and ii) how to aggregate the distinct sequences unfolded in totally different ways. To address these problems, we propose a novel Mamba-based Image Restoration model (MaIR), which consists of Nested S-shaped Scanning strategy (NSS) and Sequence Shuffle Attention block (SSA). Specifically, NSS preserves locality and continuity of the input images through the stripe-based scanning region and the S-shaped scanning path, respectively. SSA aggregates sequences through calculating attention weights within the corresponding channels of different sequences. Thanks to NSS and SSA, MaIR surpasses 40 baselines across 14 challenging datasets, achieving state-of-the-art performance on the tasks of image super-resolution, denoising, deblurring and dehazing. The code is available at https://github.com/XLearning-SCU/2025-CVPR-MaIR.

MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration

TL;DR

MaIR tackles the loss of locality and continuity in Mamba-based image restoration by introducing Nested S-shaped Scanning (NSS) and Sequence Shuffle Attention (SSA). The method preserves 2D image structure and enables effective fusion of diverse 1D sequences, yielding state-of-the-art results across image super-resolution, denoising, deblurring, and dehazing on 14 benchmarks. It achieves these gains with a cost-free approach that avoids extra computational overhead, and demonstrates robustness across stripe widths and tasks. The work advances practical restoration performance and broadens the applicability of Mamba-based models to high-quality 2D image recovery tasks.

Abstract

Recent advancements in Mamba have shown promising results in image restoration. These methods typically flatten 2D images into multiple distinct 1D sequences along rows and columns, process each sequence independently using selective scan operation, and recombine them to form the outputs. However, such a paradigm overlooks two vital aspects: i) the local relationships and spatial continuity inherent in natural images, and ii) the discrepancies among sequences unfolded through totally different ways. To overcome the drawbacks, we explore two problems in Mamba-based restoration methods: i) how to design a scanning strategy preserving both locality and continuity while facilitating restoration, and ii) how to aggregate the distinct sequences unfolded in totally different ways. To address these problems, we propose a novel Mamba-based Image Restoration model (MaIR), which consists of Nested S-shaped Scanning strategy (NSS) and Sequence Shuffle Attention block (SSA). Specifically, NSS preserves locality and continuity of the input images through the stripe-based scanning region and the S-shaped scanning path, respectively. SSA aggregates sequences through calculating attention weights within the corresponding channels of different sequences. Thanks to NSS and SSA, MaIR surpasses 40 baselines across 14 challenging datasets, achieving state-of-the-art performance on the tasks of image super-resolution, denoising, deblurring and dehazing. The code is available at https://github.com/XLearning-SCU/2025-CVPR-MaIR.
Paper Structure (17 sections, 7 equations, 9 figures, 9 tables)

This paper contains 17 sections, 7 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: The scanning strategies in existing Mamba-based methods and our proposed method. (a) Vmamba/Vim uses Z-shaped scan path to flatten 2D image into 1D sequences, in which both the locality and continuity of 2D image are disrupted. (b) Zigma utilizes S-shaped path to maintain spatial continuity, while ignores the locality. (c) LocalMamba leverages window-based scanning region to preserve locality. However, the Z-shaped scanning path within and across the windows disrupts the spatial continuity. In contrast, (d) MaIR divides images into multiple non-overlapping stripes, and adopts S-shaped scanning path within and across the stripes, thus simultaneously preserves both locality and continuity.
  • Figure 2: Illustrations of MaIR. (a) The overall architecture of MaIR, highlighting its core component, Residual Mamba Group (RMG). RMG is primarily composed of (b) Residual Mamba Block (RMB), in which (c) Visual Mamba Module (VMM) plays a pivotal role.
  • Figure 3: Illustrations of (a) Nested S-shaped Scanning strategy (NSS) and (b) shift-stripe mechanism.
  • Figure 4: Illustration of the Sequence Shuffle Attention (SSA). The input features $\{X^i\}_{i=1}^K \in \mathcal{R}^{D \times H \times W}$ are first pooled and concatenated to form $\tilde{X} \in \mathcal{R}^{L}$, where $L = K \times D$. This sequence undergoes the sequence shuffle operation and results in shuffled sequences $\hat{X} \in \mathcal{R}^{L}$, whose channels are split by $D$ group. Then, group convolution and sequence unshuffle operation are applied, producing unshuffled weights $\tilde{W} \in \mathcal{R}^{L}$, which are further chunked and reshaped into attention weights $\{W^i\}_{i=1}^K \in \mathcal{R}^{D}$. Finally, the output feature $Y \in \mathcal{R}^{D \times H \times W}$ is computed by performing a weighted summation of the input features using the attention weights.
  • Figure 5: Visual comparison of $\times4$ image super-resolution results on the Manga109 dataset. MaIR demonstrates superior visual quality, particularly in preserving fine details and textures.
  • ...and 4 more figures