Table of Contents
Fetching ...

MBMamba: When Memory Buffer Meets Mamba for Structure-Aware Image Deblurring

Hu Gao, Xiaoning Lei, Xichen Xu, Depeng Dang, Lizhuang Ma

TL;DR

MBMamba introduces MemVSSM, a memory-buffered, chunk-wise vision state space module, and an Ising-inspired regularization loss to address local detail loss and structural incoherence in Mamba-based deblurring. By dividing features into chunks with a FIFO memory bank and cross-attention fusion, MBMamba preserves both global context and local structure without adding extra scans, maintaining real-time efficiency. The Ising loss further enforces spatial coherence across neighboring pixels, improving texture and edge preservation. Experimental results on GoPro, HIDE, and RealBlur show state-of-the-art PSNR/SSIM gains with substantial reductions in compute and faster inference, validating both performance and practicality of the approach.

Abstract

The Mamba architecture has emerged as a promising alternative to CNNs and Transformers for image deblurring. However, its flatten-and-scan strategy often results in local pixel forgetting and channel redundancy, limiting its ability to effectively aggregate 2D spatial information. Although existing methods mitigate this by modifying the scan strategy or incorporating local feature modules, it increase computational complexity and hinder real-time performance. In this paper, we propose a structure-aware image deblurring network without changing the original Mamba architecture. Specifically, we design a memory buffer mechanism to preserve historical information for later fusion, enabling reliable modeling of relevance between adjacent features. Additionally, we introduce an Ising-inspired regularization loss that simulates the energy minimization of the physical system's "mutual attraction" between pixels, helping to maintain image structure and coherence. Building on this, we develop MBMamba. Experimental results show that our method outperforms state-of-the-art approaches on widely used benchmarks.

MBMamba: When Memory Buffer Meets Mamba for Structure-Aware Image Deblurring

TL;DR

MBMamba introduces MemVSSM, a memory-buffered, chunk-wise vision state space module, and an Ising-inspired regularization loss to address local detail loss and structural incoherence in Mamba-based deblurring. By dividing features into chunks with a FIFO memory bank and cross-attention fusion, MBMamba preserves both global context and local structure without adding extra scans, maintaining real-time efficiency. The Ising loss further enforces spatial coherence across neighboring pixels, improving texture and edge preservation. Experimental results on GoPro, HIDE, and RealBlur show state-of-the-art PSNR/SSIM gains with substantial reductions in compute and faster inference, validating both performance and practicality of the approach.

Abstract

The Mamba architecture has emerged as a promising alternative to CNNs and Transformers for image deblurring. However, its flatten-and-scan strategy often results in local pixel forgetting and channel redundancy, limiting its ability to effectively aggregate 2D spatial information. Although existing methods mitigate this by modifying the scan strategy or incorporating local feature modules, it increase computational complexity and hinder real-time performance. In this paper, we propose a structure-aware image deblurring network without changing the original Mamba architecture. Specifically, we design a memory buffer mechanism to preserve historical information for later fusion, enabling reliable modeling of relevance between adjacent features. Additionally, we introduce an Ising-inspired regularization loss that simulates the energy minimization of the physical system's "mutual attraction" between pixels, helping to maintain image structure and coherence. Building on this, we develop MBMamba. Experimental results show that our method outperforms state-of-the-art approaches on widely used benchmarks.

Paper Structure

This paper contains 26 sections, 17 equations, 9 figures, 8 tables, 1 algorithm.

Figures (9)

  • Figure 1: Computational cost vs. PSNR of models on the GoPro dataset Gopro. Our MBMamba achieve the SOTA performance while simultaneously reducing computational costs.
  • Figure 2: The overall architecture of the proposed MBMamba: (a) The decoder is composed of vision state space models equipped with a memory buffering mechanism (MemVSSM); (b) The internal structure of MemVSSM.
  • Figure 3: The structure of feature cross-attention mechanism (FCAM).
  • Figure 4: Image deblurring comparisons on the synthetic dataset Gopro.
  • Figure 5: Image deblurring comparisons on the real-world dataset realblurrim_2020_ECCV.
  • ...and 4 more figures