Table of Contents
Fetching ...

SlowFast-SCI: Slow-Fast Deep Unfolding Learning for Spectral Compressive Imaging

Haijin Zeng, Xuan Lu, Yurong Zhang, Qiangqiang Shen, Guoqing Chao, Li Jiang, Yongyong Chen

TL;DR

SlowFast-SCI introduces a dual-speed framework for spectral compressive imaging that combines offline slow learning with online fast, self-supervised adaptation. It distills a physics-guided, priors-based unfolding backbone into a compact fast unfolding network via imaging-guided fast unfolding distillation (IGFUD), then equips each fast unfolding block with lightweight adapters for test-time calibration. The approach achieves substantial reductions in parameters and FLOPs (over 70%), and provides up to 5.79 dB PSNR gains on out-of-distribution data with up to 4x faster adaptation, while preserving cross-domain robustness. Its modular design enables integration with any deep unfolding network, paving the way for self-adaptive, field-deployable computational imaging across modalities.

Abstract

Humans learn in two complementary ways: a slow, cumulative process that builds broad, general knowledge, and a fast, on-the-fly process that captures specific experiences. Existing deep-unfolding methods for spectral compressive imaging (SCI) mirror only the slow component-relying on heavy pre-training with many unfolding stages-yet they lack the rapid adaptation needed to handle new optical configurations. As a result, they falter on out-of-distribution cameras, especially in bespoke spectral setups unseen during training. This depth also incurs heavy computation and slow inference. To bridge this gap, we introduce SlowFast-SCI, a dual-speed framework seamlessly integrated into any deep unfolding network beyond SCI systems. During slow learning, we pre-train or reuse a priors-based backbone and distill it via imaging guidance into a compact fast-unfolding model. In the fast learning stage, lightweight adaptation modules are embedded within each block and trained self-supervised at test time via a dual-domain loss-without retraining the backbone. To the best of our knowledge, SlowFast-SCI is the first test-time adaptation-driven deep unfolding framework for efficient, self-adaptive spectral reconstruction. Its dual-stage design unites offline robustness with on-the-fly per-sample calibration-yielding over 70% reduction in parameters and FLOPs, up to 5.79 dB PSNR improvement on out-of-distribution data, preserved cross-domain adaptability, and a 4x faster adaptation speed. In addition, its modularity integrates with any deep-unfolding network, paving the way for self-adaptive, field-deployable imaging and expanded computational imaging modalities. The models, datasets, and code are available at https://github.com/XuanLu11/SlowFast-SCI.

SlowFast-SCI: Slow-Fast Deep Unfolding Learning for Spectral Compressive Imaging

TL;DR

SlowFast-SCI introduces a dual-speed framework for spectral compressive imaging that combines offline slow learning with online fast, self-supervised adaptation. It distills a physics-guided, priors-based unfolding backbone into a compact fast unfolding network via imaging-guided fast unfolding distillation (IGFUD), then equips each fast unfolding block with lightweight adapters for test-time calibration. The approach achieves substantial reductions in parameters and FLOPs (over 70%), and provides up to 5.79 dB PSNR gains on out-of-distribution data with up to 4x faster adaptation, while preserving cross-domain robustness. Its modular design enables integration with any deep unfolding network, paving the way for self-adaptive, field-deployable computational imaging across modalities.

Abstract

Humans learn in two complementary ways: a slow, cumulative process that builds broad, general knowledge, and a fast, on-the-fly process that captures specific experiences. Existing deep-unfolding methods for spectral compressive imaging (SCI) mirror only the slow component-relying on heavy pre-training with many unfolding stages-yet they lack the rapid adaptation needed to handle new optical configurations. As a result, they falter on out-of-distribution cameras, especially in bespoke spectral setups unseen during training. This depth also incurs heavy computation and slow inference. To bridge this gap, we introduce SlowFast-SCI, a dual-speed framework seamlessly integrated into any deep unfolding network beyond SCI systems. During slow learning, we pre-train or reuse a priors-based backbone and distill it via imaging guidance into a compact fast-unfolding model. In the fast learning stage, lightweight adaptation modules are embedded within each block and trained self-supervised at test time via a dual-domain loss-without retraining the backbone. To the best of our knowledge, SlowFast-SCI is the first test-time adaptation-driven deep unfolding framework for efficient, self-adaptive spectral reconstruction. Its dual-stage design unites offline robustness with on-the-fly per-sample calibration-yielding over 70% reduction in parameters and FLOPs, up to 5.79 dB PSNR improvement on out-of-distribution data, preserved cross-domain adaptability, and a 4x faster adaptation speed. In addition, its modularity integrates with any deep-unfolding network, paving the way for self-adaptive, field-deployable imaging and expanded computational imaging modalities. The models, datasets, and code are available at https://github.com/XuanLu11/SlowFast-SCI.

Paper Structure

This paper contains 15 sections, 10 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: (a) Previous SCI reconstruction method vs. our SlowFast-SCI. (b) The PSNR-FLOPs-Params analysis comparing the proposed SlowFast-SCI with latest state-of-the-art methods.
  • Figure 2: Illustration of SlowFast-SCI. The left side depicts the slow learning process, pre-training on the source data with a deep unfolding network. IGFUD distills the knowledge of the deep unfolding backbone into a compact fast unfolding network (FastUN). The right side shows the fast learning process, where FAST-TTAM rapidly adapts the FastUN to align the model with target scenes at inference time. The far right illustrates the details of FAST-TTAM, which takes both spatial and spectral information into account during fast learning.
  • Figure 3: Comparison of DUN methods on 10 scenes of Harvard dataset. For all methods, PSNR (dB) and per-scene inference time (ms) are reported. For the unfolding method DPU, results with different numbers of stages are provided.
  • Figure 4: Log-scaled visualization of 1/p-values for different k-values. The p-values obtained by performing a permutation test ($n = 1000$) on the kernel matrices derived from the Maximum Mean Discrepancy (MMD) rabanser2019failing between CAVE and the other three datasets (KAIST, Harvard, and ICVL). Each image (originally of size 256 × 256 × 28) is processed using average pooling with different window sizes $k$. Green bold values indicate statistically significant differences ($p < 0.05$).
  • Figure 5: Evaluating the cross-domain generalization of classic DUN. Images with blue backgrounds represent the average log-spectrum of each dataset. The deep unfolding network is trained on the CAVE dataset during the slow learning stage. To evaluate cross-domain generalization, the model is tested on two out-of-distribution (OOD) datasets (Harvard and ICVL) and on real-world test samples, rather than on in-distribution (IND) datasets such as KAIST used in prior works. The average PSNR (dB) demonstrates that classic DUN performs well on IND test set but experiences a noticeable performance drop when applied to OOD test samples.
  • ...and 4 more figures