Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging

Mengjie Qin; Yuchao Feng; Zongliang Wu; Yulun Zhang; Xin Yuan

Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging

Mengjie Qin, Yuchao Feng, Zongliang Wu, Yulun Zhang, Xin Yuan

TL;DR

This work tackles hyperspectral image reconstruction from snapshot CASSI measurements, addressing the ill-posedness with a physics-informed, data-driven approach. MiJUN combines a Mamba-inspired Transformer-based prior extractor with an accelerated HQS (A-HQS) optimization and mode-$k$ tensor unfolding to capture both global context and local texture efficiently. The method introduces a proximal denoiser (MMB) with GLAM attention and a mode-$k$ unfolding strategy that yields 12 directional scans, enabling strong low-rank representations. Comprehensive experiments on simulated and real CASSI data show MiJUN achieving state-of-the-art PSNR/SSIM with fewer parameters and lower FLOPs, delivering sharper detail and better visual fidelity than competing methods.

Abstract

In the coded aperture snapshot spectral imaging system, Deep Unfolding Networks (DUNs) have made impressive progress in recovering 3D hyperspectral images (HSIs) from a single 2D measurement. However, the inherent nonlinear and ill-posed characteristics of HSI reconstruction still pose challenges to existing methods in terms of accuracy and stability. To address this issue, we propose a Mamba-inspired Joint Unfolding Network (MiJUN), which integrates physics-embedded DUNs with learning-based HSI imaging. Firstly, leveraging the concept of trapezoid discretization to expand the representation space of unfolding networks, we introduce an accelerated unfolding network scheme. This approach can be interpreted as a generalized accelerated half-quadratic splitting with a second-order differential equation, which reduces the reliance on initial optimization stages and addresses challenges related to long-range interactions. Crucially, within the Mamba framework, we restructure the Mamba-inspired global-to-local attention mechanism by incorporating a selective state space model and an attention mechanism. This effectively reinterprets Mamba as a variant of the Transformer} architecture, improving its adaptability and efficiency. Furthermore, we refine the scanning strategy with Mamba by integrating the tensor mode-$k$ unfolding into the Mamba network. This approach emphasizes the low-rank properties of tensors along various modes, while conveniently facilitating 12 scanning directions. Numerical and visual comparisons on both simulation and real datasets demonstrate the superiority of our proposed MiJUN, and achieving overwhelming detail representation.

Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging

TL;DR

tensor unfolding to capture both global context and local texture efficiently. The method introduces a proximal denoiser (MMB) with GLAM attention and a mode-

unfolding strategy that yields 12 directional scans, enabling strong low-rank representations. Comprehensive experiments on simulated and real CASSI data show MiJUN achieving state-of-the-art PSNR/SSIM with fewer parameters and lower FLOPs, delivering sharper detail and better visual fidelity than competing methods.

Abstract

unfolding into the Mamba network. This approach emphasizes the low-rank properties of tensors along various modes, while conveniently facilitating 12 scanning directions. Numerical and visual comparisons on both simulation and real datasets demonstrate the superiority of our proposed MiJUN, and achieving overwhelming detail representation.

Paper Structure (14 sections, 12 equations, 7 figures, 3 tables)

This paper contains 14 sections, 12 equations, 7 figures, 3 tables.

Introduction
Related Work
Vision Transformer for CASSI
State Space Model
Methodology
Degradation model of CASSI
Accelerated deep unfolding framework
Prior extractor
Experiments
Experimental settings
Compare with State-of-the-art
Ablation study
Conclusion
Acknowledgments

Figures (7)

Figure 1: Comparison of reconstruction quality vs. Parameters(M), and FLOPs(G). Our proposed method outperforms comparisons, while utilizing less computational costs. Notably, the images on the right show the feature maps of RDULF and our method, where our features exhibit reduced noise and sharper edges.
Figure 2: A schematic diagram of CASSI.
Figure 3: An overview of our proposed MiJUN for HSI reconstruction task, including input & Initial, MiJUN model, and Output. The model includes three iterative operators: $\boldsymbol{x}$, $\boldsymbol{z}$, $\hat{\boldsymbol{z}}$. During the iterative process, the parameters are estimated by the DADN, with $\hat{\boldsymbol{z}}$ learned through the MMB block.
Figure 4: The diagram of the proposed MMIT. Features are first sufficiently modeled with local and global information through the Mamba-i T module, followed by the M-$k$ Mamba to further enhance the low-rank attributes.
Figure 5: Illustration of Mode-$k$ unfolding along each direction of 3D tensor and linear-overhead SSM with different-direction scanning scheme. The low rank of each matrix after unfolding is demonstrated by singular value decomposition (SVD(log)).
...and 2 more figures

Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging

TL;DR

Abstract

Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging

Authors

TL;DR

Abstract

Table of Contents

Figures (7)