Table of Contents
Fetching ...

Cross-Scan Mamba with Masked Training for Robust Spectral Imaging

Wenzhe Tian, Haijin Zeng, Yin-Ping Zhao, Yongyong Chen, Zhen Wang, Xuelong Li

TL;DR

The paper tackles robust hyperspectral image reconstruction from snapshot compressive imaging by introducing CS-Mamba, a spatial–spectral state-space model that combines a cross-scan mechanism and a local-enhancement block to jointly model spatial and spectral correlations. Built within a deep-unfolding framework, the method employs a Mamba-based denoiser and a mask-guided training regime to address real-world noise and generalization gaps. Key contributions include the LE-SSM and CS-SSM modules, a cross-scan strategy that surpasses traditional spectral scanning, and a masked training approach that improves reconstruction quality on real data, achieving state-of-the-art PSNR/SSIM with reduced computational cost. The experimental results on simulated CAVE/KAIST data and real CASSI measurements demonstrate both quantitative gains and improved visual fidelity, highlighting practical impact for fast, robust HSI capture in SCI systems.

Abstract

Snapshot Compressive Imaging (SCI) enables fast spectral imaging but requires effective decoding algorithms for hyperspectral image (HSI) reconstruction from compressed measurements. Current CNN-based methods are limited in modeling long-range dependencies, while Transformer-based models face high computational complexity. Although recent Mamba models outperform CNNs and Transformers in RGB tasks concerning computational efficiency or accuracy, they are not specifically optimized to fully leverage the local spatial and spectral correlations inherent in HSIs. To address this, we propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding and cross-channel interaction promotion. Besides, while current reconstruction algorithms perform increasingly well in simulation scenarios, they exhibit suboptimal performance on real data due to limited generalization capability. During the training process, the model may not capture the inherent features of the images but rather learn the parameters to mitigate specific noise and loss, which may lead to a decline in reconstruction quality when faced with real scenes. To overcome this challenge, we propose a masked training method to enhance the generalization ability of models. Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.

Cross-Scan Mamba with Masked Training for Robust Spectral Imaging

TL;DR

The paper tackles robust hyperspectral image reconstruction from snapshot compressive imaging by introducing CS-Mamba, a spatial–spectral state-space model that combines a cross-scan mechanism and a local-enhancement block to jointly model spatial and spectral correlations. Built within a deep-unfolding framework, the method employs a Mamba-based denoiser and a mask-guided training regime to address real-world noise and generalization gaps. Key contributions include the LE-SSM and CS-SSM modules, a cross-scan strategy that surpasses traditional spectral scanning, and a masked training approach that improves reconstruction quality on real data, achieving state-of-the-art PSNR/SSIM with reduced computational cost. The experimental results on simulated CAVE/KAIST data and real CASSI measurements demonstrate both quantitative gains and improved visual fidelity, highlighting practical impact for fast, robust HSI capture in SCI systems.

Abstract

Snapshot Compressive Imaging (SCI) enables fast spectral imaging but requires effective decoding algorithms for hyperspectral image (HSI) reconstruction from compressed measurements. Current CNN-based methods are limited in modeling long-range dependencies, while Transformer-based models face high computational complexity. Although recent Mamba models outperform CNNs and Transformers in RGB tasks concerning computational efficiency or accuracy, they are not specifically optimized to fully leverage the local spatial and spectral correlations inherent in HSIs. To address this, we propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding and cross-channel interaction promotion. Besides, while current reconstruction algorithms perform increasingly well in simulation scenarios, they exhibit suboptimal performance on real data due to limited generalization capability. During the training process, the model may not capture the inherent features of the images but rather learn the parameters to mitigate specific noise and loss, which may lead to a decline in reconstruction quality when faced with real scenes. To overcome this challenge, we propose a masked training method to enhance the generalization ability of models. Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
Paper Structure (13 sections, 17 equations, 7 figures, 6 tables)

This paper contains 13 sections, 17 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: PSNR-FLOPs comparisons of CS-Mamba and previous Deep Unfolding SOTA methods.
  • Figure 2: Schematic of the CASSI system: A 3D HSI cube is modulated by a mask, sheared by a disperser, and transformed into a 2D-coded measurement.
  • Figure 3: The flowchart of the proposed CS-Mamba. (a) Parameter Estimation Network. This module is used for capturing parameter $\alpha$ and noise level $\beta$ to guide the reconstruction process. (b) The U-shaped denoiser network and DUN framework. (c) The Spatial-Spectral SSM Block adopted in (b). This module mainly consists of a spatial SSM and a spectral SSM, depicted in the sub-flowchart (d) and (e), respectively. (f) The specific method of LE-SSM in the Spatial SSM Module. Scanning directions are divided into global and local branches. (g) The illustration of our Cross-Scan Mechanism. The scanning process is demonstrated by labeling the order of each data point.
  • Figure 4: A visual comparison of with and w/o masked methods in our CS-Mamba model on real scene 1.
  • Figure 5: A visual comparison of the performance of MST, CST, and DAUHST methods in the real scene.
  • ...and 2 more figures