Cross-Scan Mamba with Masked Training for Robust Spectral Imaging
Wenzhe Tian, Haijin Zeng, Yin-Ping Zhao, Yongyong Chen, Zhen Wang, Xuelong Li
TL;DR
The paper tackles robust hyperspectral image reconstruction from snapshot compressive imaging by introducing CS-Mamba, a spatial–spectral state-space model that combines a cross-scan mechanism and a local-enhancement block to jointly model spatial and spectral correlations. Built within a deep-unfolding framework, the method employs a Mamba-based denoiser and a mask-guided training regime to address real-world noise and generalization gaps. Key contributions include the LE-SSM and CS-SSM modules, a cross-scan strategy that surpasses traditional spectral scanning, and a masked training approach that improves reconstruction quality on real data, achieving state-of-the-art PSNR/SSIM with reduced computational cost. The experimental results on simulated CAVE/KAIST data and real CASSI measurements demonstrate both quantitative gains and improved visual fidelity, highlighting practical impact for fast, robust HSI capture in SCI systems.
Abstract
Snapshot Compressive Imaging (SCI) enables fast spectral imaging but requires effective decoding algorithms for hyperspectral image (HSI) reconstruction from compressed measurements. Current CNN-based methods are limited in modeling long-range dependencies, while Transformer-based models face high computational complexity. Although recent Mamba models outperform CNNs and Transformers in RGB tasks concerning computational efficiency or accuracy, they are not specifically optimized to fully leverage the local spatial and spectral correlations inherent in HSIs. To address this, we propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding and cross-channel interaction promotion. Besides, while current reconstruction algorithms perform increasingly well in simulation scenarios, they exhibit suboptimal performance on real data due to limited generalization capability. During the training process, the model may not capture the inherent features of the images but rather learn the parameters to mitigate specific noise and loss, which may lead to a decline in reconstruction quality when faced with real scenes. To overcome this challenge, we propose a masked training method to enhance the generalization ability of models. Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
