Table of Contents
Fetching ...

In2SET: Intra-Inter Similarity Exploiting Transformer for Dual-Camera Compressive Hyperspectral Imaging

Xin Wang, Lizhi Wang, Xiangtian Ma, Maoqing Zhang, Lin Zhu, Hua Huang

TL;DR

The paper tackles the ill-posed problem of reconstructing hyperspectral images from dual-camera compressive sensing (DCCHI). It introduces In2SET, a Transformer-based denoiser that exploits intra-similarity approximated from the PAN image and inter-similarity between HSI and PAN to provide strong content priors. Integrated into a PAN-guided unrolling framework (PGDU), In2SET uses a guided feature pyramid from the PAN image and solves the data-fidelity term with conjugate gradients while denoising with a PAN-guided denoiser, improving spatial-spectral fidelity. Extensive experiments on simulated and real DCCHI data show that In2SET achieves state-of-the-art reconstruction quality with lower computational cost, and ablations validate the contributions of intra/inter-similarity attention and the CRW mechanism. Overall, the approach offers a practical, high-fidelity solution for snapshot hyperspectral imaging by effectively leveraging PAN-derived semantic and structural cues.

Abstract

Dual-Camera Compressed Hyperspectral Imaging (DCCHI) offers the capability to reconstruct 3D Hyperspectral Image (HSI) by fusing compressive and Panchromatic (PAN) image, which has shown great potential for snapshot hyperspectral imaging in practice. In this paper, we introduce a novel DCCHI reconstruction network, the Intra-Inter Similarity Exploiting Transformer (In2SET). Our key insight is to make full use of the PAN image to assist the reconstruction. To this end, we propose using the intra-similarity within the PAN image as a proxy for approximating the intra-similarity in the original HSI, thereby offering an enhanced content prior for more accurate HSI reconstruction. Furthermore, we aim to align the features from the underlying HSI with those of the PAN image, maintaining semantic consistency and introducing new contextual information for the reconstruction process. By integrating In2SET into a PAN-guided unrolling framework, our method substantially enhances the spatial-spectral fidelity and detail of the reconstructed images, providing a more comprehensive and accurate depiction of the scene. Extensive experiments conducted on both real and simulated datasets demonstrate that our approach consistently outperforms existing state-of-the-art methods in terms of reconstruction quality and computational complexity. Code will be released.

In2SET: Intra-Inter Similarity Exploiting Transformer for Dual-Camera Compressive Hyperspectral Imaging

TL;DR

The paper tackles the ill-posed problem of reconstructing hyperspectral images from dual-camera compressive sensing (DCCHI). It introduces In2SET, a Transformer-based denoiser that exploits intra-similarity approximated from the PAN image and inter-similarity between HSI and PAN to provide strong content priors. Integrated into a PAN-guided unrolling framework (PGDU), In2SET uses a guided feature pyramid from the PAN image and solves the data-fidelity term with conjugate gradients while denoising with a PAN-guided denoiser, improving spatial-spectral fidelity. Extensive experiments on simulated and real DCCHI data show that In2SET achieves state-of-the-art reconstruction quality with lower computational cost, and ablations validate the contributions of intra/inter-similarity attention and the CRW mechanism. Overall, the approach offers a practical, high-fidelity solution for snapshot hyperspectral imaging by effectively leveraging PAN-derived semantic and structural cues.

Abstract

Dual-Camera Compressed Hyperspectral Imaging (DCCHI) offers the capability to reconstruct 3D Hyperspectral Image (HSI) by fusing compressive and Panchromatic (PAN) image, which has shown great potential for snapshot hyperspectral imaging in practice. In this paper, we introduce a novel DCCHI reconstruction network, the Intra-Inter Similarity Exploiting Transformer (In2SET). Our key insight is to make full use of the PAN image to assist the reconstruction. To this end, we propose using the intra-similarity within the PAN image as a proxy for approximating the intra-similarity in the original HSI, thereby offering an enhanced content prior for more accurate HSI reconstruction. Furthermore, we aim to align the features from the underlying HSI with those of the PAN image, maintaining semantic consistency and introducing new contextual information for the reconstruction process. By integrating In2SET into a PAN-guided unrolling framework, our method substantially enhances the spatial-spectral fidelity and detail of the reconstructed images, providing a more comprehensive and accurate depiction of the scene. Extensive experiments conducted on both real and simulated datasets demonstrate that our approach consistently outperforms existing state-of-the-art methods in terms of reconstruction quality and computational complexity. Code will be released.
Paper Structure (17 sections, 17 equations, 6 figures, 4 tables)

This paper contains 17 sections, 17 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Illustration of the proposed In2SET method for hyperspectral image reconstruction. (a) Intra-Similarity: extraction and application of spatial attention maps from the PAN image to enhance the spatial resolution of the reconstructed HSI. (b) Inter-Similarity: utilization of semantic features from the HSI and PAN image, scored by their consistency, to inform and refine the reconstruction of HSI.
  • Figure 2: The dual-camera compressive hyperspectral imaging system.
  • Figure 3: Overview of PGDU. The InitialNet initiates the process with compressive measurements and sensing matrix, followed by a series of stages each containing a conjugate gradient (CG) block and an In2SET denoiser.
  • Figure 4: Diagram of In2SET architecture. (a) The U-shaped In2SET structure. (b) In2AB, consisting of two normalization layers, an intra-similarity attention module, an inter-similarity attention module, and an FFN layer. (c) The components of NPL. (d) The multi-head self-attention in channel (MHA-C) and multi-head cross-attention in spatial (MHA-S). (e) The cosine similarity reweighting (CRW) mechanism.
  • Figure 5: Comparative reconstruction results of different reconstruction methods for Scene 3 from the KAIST dataset at spectral bands 476.5nm, 536.5nm, 584.5nm, and 625.0nm. The spectral density curves are plotted from the blue region in the colorchecker.
  • ...and 1 more figures