In2SET: Intra-Inter Similarity Exploiting Transformer for Dual-Camera Compressive Hyperspectral Imaging
Xin Wang, Lizhi Wang, Xiangtian Ma, Maoqing Zhang, Lin Zhu, Hua Huang
TL;DR
The paper tackles the ill-posed problem of reconstructing hyperspectral images from dual-camera compressive sensing (DCCHI). It introduces In2SET, a Transformer-based denoiser that exploits intra-similarity approximated from the PAN image and inter-similarity between HSI and PAN to provide strong content priors. Integrated into a PAN-guided unrolling framework (PGDU), In2SET uses a guided feature pyramid from the PAN image and solves the data-fidelity term with conjugate gradients while denoising with a PAN-guided denoiser, improving spatial-spectral fidelity. Extensive experiments on simulated and real DCCHI data show that In2SET achieves state-of-the-art reconstruction quality with lower computational cost, and ablations validate the contributions of intra/inter-similarity attention and the CRW mechanism. Overall, the approach offers a practical, high-fidelity solution for snapshot hyperspectral imaging by effectively leveraging PAN-derived semantic and structural cues.
Abstract
Dual-Camera Compressed Hyperspectral Imaging (DCCHI) offers the capability to reconstruct 3D Hyperspectral Image (HSI) by fusing compressive and Panchromatic (PAN) image, which has shown great potential for snapshot hyperspectral imaging in practice. In this paper, we introduce a novel DCCHI reconstruction network, the Intra-Inter Similarity Exploiting Transformer (In2SET). Our key insight is to make full use of the PAN image to assist the reconstruction. To this end, we propose using the intra-similarity within the PAN image as a proxy for approximating the intra-similarity in the original HSI, thereby offering an enhanced content prior for more accurate HSI reconstruction. Furthermore, we aim to align the features from the underlying HSI with those of the PAN image, maintaining semantic consistency and introducing new contextual information for the reconstruction process. By integrating In2SET into a PAN-guided unrolling framework, our method substantially enhances the spatial-spectral fidelity and detail of the reconstructed images, providing a more comprehensive and accurate depiction of the scene. Extensive experiments conducted on both real and simulated datasets demonstrate that our approach consistently outperforms existing state-of-the-art methods in terms of reconstruction quality and computational complexity. Code will be released.
