Table of Contents
Fetching ...

Self-Supervised One-Step Diffusion Refinement for Snapshot Compressive Imaging

Shaoguang Huang, Yunzhen Wang, Haijin Zeng, Hongyu Chen, Hongyan Zhang

TL;DR

The paper tackles the ill-posed task of reconstructing multispectral images from a single snapshot in snapshot compressive imaging. It introduces a self-supervised One-Step Diffusion Refinement (OSD) framework that couples an existing SCI predictor with a single-step diffusion residual (DiFA), enabling fast, high-fidelity refinements without iterative denoising. A spectral compression distillation strategy transfers RGB diffusion priors to MSI space, while an equivariant imaging consistency loss leverages 2-D measurements alone for robust training and generalization. Across simulations and real CASSI data, the method achieves state-of-the-art PSNR/SSIM and substantial speedups, demonstrating practical viability for real-world SCI reconstruction.

Abstract

Snapshot compressive imaging (SCI) captures multispectral images (MSIs) using a single coded two-dimensional (2-D) measurement, but reconstructing high-fidelity MSIs from these compressed inputs remains a fundamentally ill-posed challenge. While diffusion-based reconstruction methods have recently raised the bar for quality, they face critical limitations: a lack of large-scale MSI training data, adverse domain shifts from RGB-pretrained models, and inference inefficiencies due to multi-step sampling. These drawbacks restrict their practicality in real-world applications. In contrast to existing methods, which either follow costly iterative refinement or adapt subspace-based embeddings for diffusion models (e.g. DiffSCI, PSR-SCI), we introduce a fundamentally different paradigm: a self-supervised One-Step Diffusion (OSD) framework specifically designed for SCI. The key novelty lies in using a single-step diffusion refiner to correct an initial reconstruction, eliminating iterative denoising entirely while preserving generative quality. Moreover, we adopt a self-supervised equivariant learning strategy to train both the predictor and refiner directly from raw 2-D measurements, enabling generalization to unseen domains without the need for ground-truth MSI. To further address the challenge of limited MSI data, we design a band-selection-driven distillation strategy that transfers core generative priors from large-scale RGB datasets, effectively bridging the domain gap. Extensive experiments confirm that our approach sets a new benchmark, yielding PSNR gains of 3.44 dB, 1.61 dB, and 0.28 dB on the Harvard, NTIRE, and ICVL datasets, respectively, while reducing reconstruction time by 97.5%. This remarkable improvement in efficiency and adaptability makes our method a significant advancement in SCI reconstruction, combining both accuracy and practicality for real-world deployment.

Self-Supervised One-Step Diffusion Refinement for Snapshot Compressive Imaging

TL;DR

The paper tackles the ill-posed task of reconstructing multispectral images from a single snapshot in snapshot compressive imaging. It introduces a self-supervised One-Step Diffusion Refinement (OSD) framework that couples an existing SCI predictor with a single-step diffusion residual (DiFA), enabling fast, high-fidelity refinements without iterative denoising. A spectral compression distillation strategy transfers RGB diffusion priors to MSI space, while an equivariant imaging consistency loss leverages 2-D measurements alone for robust training and generalization. Across simulations and real CASSI data, the method achieves state-of-the-art PSNR/SSIM and substantial speedups, demonstrating practical viability for real-world SCI reconstruction.

Abstract

Snapshot compressive imaging (SCI) captures multispectral images (MSIs) using a single coded two-dimensional (2-D) measurement, but reconstructing high-fidelity MSIs from these compressed inputs remains a fundamentally ill-posed challenge. While diffusion-based reconstruction methods have recently raised the bar for quality, they face critical limitations: a lack of large-scale MSI training data, adverse domain shifts from RGB-pretrained models, and inference inefficiencies due to multi-step sampling. These drawbacks restrict their practicality in real-world applications. In contrast to existing methods, which either follow costly iterative refinement or adapt subspace-based embeddings for diffusion models (e.g. DiffSCI, PSR-SCI), we introduce a fundamentally different paradigm: a self-supervised One-Step Diffusion (OSD) framework specifically designed for SCI. The key novelty lies in using a single-step diffusion refiner to correct an initial reconstruction, eliminating iterative denoising entirely while preserving generative quality. Moreover, we adopt a self-supervised equivariant learning strategy to train both the predictor and refiner directly from raw 2-D measurements, enabling generalization to unseen domains without the need for ground-truth MSI. To further address the challenge of limited MSI data, we design a band-selection-driven distillation strategy that transfers core generative priors from large-scale RGB datasets, effectively bridging the domain gap. Extensive experiments confirm that our approach sets a new benchmark, yielding PSNR gains of 3.44 dB, 1.61 dB, and 0.28 dB on the Harvard, NTIRE, and ICVL datasets, respectively, while reducing reconstruction time by 97.5%. This remarkable improvement in efficiency and adaptability makes our method a significant advancement in SCI reconstruction, combining both accuracy and practicality for real-world deployment.
Paper Structure (21 sections, 12 equations, 5 figures, 3 tables)

This paper contains 21 sections, 12 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison with the latest diffusion-based SCI reconstruction methods on zero-shot datasets Harvard, NTIRE and ICVL in terms of (a) PSNR and (b) inference time.
  • Figure 2: (a) Workflow for our self-supervised training strategy. The measurement $\mathbf{y}$ and mask $\mathcal{H}$ are initially input into DiFA Network $\mathcal{F_{\theta}}$, resulting in the recovered MSI $\mathbf{x^{(1)}}$. Next, a series of transformations $T_{g}$ containing shift, rotation, reflection, etc. are applied to $\mathbf{x^{(1)}}$ to produce $\mathbf{x^{(2)}}$. The MSI $\mathbf{x^{(2)}}$ is then modulated again by the mask $\mathcal{H}$ to obtain the compressed measurement $\mathbf{y ^{(2)}}$, which is finally input into $\mathcal{F_{\theta}}$ to obtain the re-reconstructed MSI $\mathbf{x^{(3)}}$. (b) DiFA Network, on the one hand, the measurement $\mathbf{y}$ and mask $\mathcal{H}$ are initially input into pre-trained reconstruction network $g_{\theta}$ to get the initial predictor $\mathbf{x}_{\mathrm{init}}$, the residual of MSI $\mathbf{r}$ is generated from noise $\mathbf{x}_{T}$ by one-step diffusion $f_{\theta}$. Then, the refined image $\mathbf{x^{(1)}}$ is obtained by adding the residual $\mathbf{r}$ to the initial prediction $\mathbf{x}_{\mathrm{init}}$. On the other hand, both $\mathbf{x}_{\mathrm{init}}$ and $\mathbf{x^{(1)}}$ are converted to RGB images through band selection $R(\cdot)$. A distillation loss is developed to leverage the prior knowledge of pre-trained multistep diffusion models.
  • Figure 3: The refined MSI, obtained with the initial and residual MSIs, exhibits superior visual quality with finer details compared to the baseline DAUHST.
  • Figure 4: Simulated MSI reconstruction comparisons on zero-shot datasets NTIRE, ICVL and Harvard. The right shows the reconstructed spectral curves corresponding to the selected region.
  • Figure 5: Real MSI reconstruction results on Scene 4 with 6 spectral channels. Please zoom in for a better view.