Table of Contents
Fetching ...

Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging

Zongliang Wu, Ruiying Lu, Ying Fu, Xin Yuan

TL;DR

The paper tackles the challenge of reconstructing high-dimensional hyperspectral data from a single snapshot in CASSI, an inherently ill-posed problem. It introduces a latent diffusion prior to guide a physics-based unfolding network, combining two-phase training (prior learning from clean HSIs and diffusion-conditioned prior generation) with a Trident Transformer to fuse degradation-free priors with spatial and spectral information. The approach uses a GC-GAP unfolding framework and a lightweight latent encoder, enabling efficient inference while delivering higher PSNR/SSIM and reduced compute compared to state-of-the-art methods. Experimental results on synthetic and real SD-CASSI data demonstrate superior reconstruction quality and practical efficiency, with ablations validating the efficacy of the LDM priors and the TT design.

Abstract

Snapshot compressive spectral imaging reconstruction aims to reconstruct three-dimensional spatial-spectral images from a single-shot two-dimensional compressed measurement. Existing state-of-the-art methods are mostly based on deep unfolding structures but have intrinsic performance bottlenecks: $i$) the ill-posed problem of dealing with heavily degraded measurement, and $ii$) the regression loss-based reconstruction models being prone to recover images with few details. In this paper, we introduce a generative model, namely the latent diffusion model (LDM), to generate degradation-free prior to enhance the regression-based deep unfolding method. Furthermore, to overcome the large computational cost challenge in LDM, we propose a lightweight model to generate knowledge priors in deep unfolding denoiser, and integrate these priors to guide the reconstruction process for compensating high-quality spectral signal details. Numeric and visual comparisons on synthetic and real-world datasets illustrate the superiority of our proposed method in both reconstruction quality and computational efficiency. Code will be released.

Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging

TL;DR

The paper tackles the challenge of reconstructing high-dimensional hyperspectral data from a single snapshot in CASSI, an inherently ill-posed problem. It introduces a latent diffusion prior to guide a physics-based unfolding network, combining two-phase training (prior learning from clean HSIs and diffusion-conditioned prior generation) with a Trident Transformer to fuse degradation-free priors with spatial and spectral information. The approach uses a GC-GAP unfolding framework and a lightweight latent encoder, enabling efficient inference while delivering higher PSNR/SSIM and reduced compute compared to state-of-the-art methods. Experimental results on synthetic and real SD-CASSI data demonstrate superior reconstruction quality and practical efficiency, with ablations validating the efficacy of the LDM priors and the TT design.

Abstract

Snapshot compressive spectral imaging reconstruction aims to reconstruct three-dimensional spatial-spectral images from a single-shot two-dimensional compressed measurement. Existing state-of-the-art methods are mostly based on deep unfolding structures but have intrinsic performance bottlenecks: ) the ill-posed problem of dealing with heavily degraded measurement, and ) the regression loss-based reconstruction models being prone to recover images with few details. In this paper, we introduce a generative model, namely the latent diffusion model (LDM), to generate degradation-free prior to enhance the regression-based deep unfolding method. Furthermore, to overcome the large computational cost challenge in LDM, we propose a lightweight model to generate knowledge priors in deep unfolding denoiser, and integrate these priors to guide the reconstruction process for compensating high-quality spectral signal details. Numeric and visual comparisons on synthetic and real-world datasets illustrate the superiority of our proposed method in both reconstruction quality and computational efficiency. Code will be released.
Paper Structure (14 sections, 16 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 14 sections, 16 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: (a) Comparison of PSNR (dB)-FLOPs (G) with previous HSI reconstruction methods. (b) The ablation study of using different time steps in diffusion. Our method achieves the desired results by only very few steps.
  • Figure 2: The top row: the error maps of the previous SOTA and our method. The bottom row is the feature map before and after applying LDM enhancement. The enhanced features demonstrate less noise and clearer edges.
  • Figure 3: (a) The single disperser CASSI imaging process. HSI data cube is captured by a monochromatic sensor. (b) GC-GAP projection. (c) Latent encoder. (d) Simplified Denoiser. (e) The measurement $\boldsymbol{y}$ and masks $\boldsymbol{A}$ pass through an N-stage DUN, where each stage is composed of a GC-GAP projection and a denoiser. The denoiser follows a U-shape structure and consists of five Trident Transformers (TT), where each TT is assisted with prior knowledge $\boldsymbol{z}_{GT}$ generated from the diffusion model.
  • Figure 4: (a) The Trident Transformer in Fig. \ref{['fig:overall']}(d). (b)-(d) are the detailed sub-modules. ${{\bf U}}_i$ is the input feature. The prior feature ${\bf Z}_i$ is sent into the prior flow.
  • Figure 5: The visualization result on synthetic data. 3 out of 28 wavelengths are selected for visual comparison. 'Corr' in the top left curve is the correlation coefficient between one method curve and the ground truth curve of the chosen (golden box) region.
  • ...and 1 more figures