Table of Contents
Fetching ...

Exploring Spatiotemporal Feature Propagation for Video-Level Compressive Spectral Reconstruction: Dataset, Model and Benchmark

Lijing Cai, Zhan Shi, Chenglong Huang, Jinyao Wu, Qiping Li, Zikang Huo, Linsen Chen, Chongde Zi, Xun Cao

TL;DR

The Propagation-Guided Spectral Video Reconstruction Transformer (PG-SVRT) is proposed, which employs a spatial-then-temporal attention to effectively reconstruct spectral features from abundant video information, while using a bridged token to reduce computational complexity.

Abstract

Recently, Spectral Compressive Imaging (SCI) has achieved remarkable success, unlocking significant potential for dynamic spectral vision. However, existing reconstruction methods, primarily image-based, suffer from two limitations: (i) Encoding process masks spatial-spectral features, leading to uncertainty in reconstructing missing information from single compressed measurements, and (ii) The frame-by-frame reconstruction paradigm fails to ensure temporal consistency, which is crucial in the video perception. To address these challenges, this paper seeks to advance spectral reconstruction from the image level to the video level, leveraging the complementary features and temporal continuity across adjacent frames in dynamic scenes. Initially, we construct the first high-quality dynamic hyperspectral image dataset (DynaSpec), comprising 30 sequences obtained through frame-scanning acquisition. Subsequently, we propose the Propagation-Guided Spectral Video Reconstruction Transformer (PG-SVRT), which employs a spatial-then-temporal attention to effectively reconstruct spectral features from abundant video information, while using a bridged token to reduce computational complexity. Finally, we conduct simulation experiments to assess the performance of four SCI systems, and construct a DD-CASSI prototype for real-world data collection and benchmarking. Extensive experiments demonstrate that PG-SVRT achieves superior performance in reconstruction quality, spectral fidelity, and temporal consistency, while maintaining minimal FLOPs. Project page: https://github.com/nju-cite/DynaSpec

Exploring Spatiotemporal Feature Propagation for Video-Level Compressive Spectral Reconstruction: Dataset, Model and Benchmark

TL;DR

The Propagation-Guided Spectral Video Reconstruction Transformer (PG-SVRT) is proposed, which employs a spatial-then-temporal attention to effectively reconstruct spectral features from abundant video information, while using a bridged token to reduce computational complexity.

Abstract

Recently, Spectral Compressive Imaging (SCI) has achieved remarkable success, unlocking significant potential for dynamic spectral vision. However, existing reconstruction methods, primarily image-based, suffer from two limitations: (i) Encoding process masks spatial-spectral features, leading to uncertainty in reconstructing missing information from single compressed measurements, and (ii) The frame-by-frame reconstruction paradigm fails to ensure temporal consistency, which is crucial in the video perception. To address these challenges, this paper seeks to advance spectral reconstruction from the image level to the video level, leveraging the complementary features and temporal continuity across adjacent frames in dynamic scenes. Initially, we construct the first high-quality dynamic hyperspectral image dataset (DynaSpec), comprising 30 sequences obtained through frame-scanning acquisition. Subsequently, we propose the Propagation-Guided Spectral Video Reconstruction Transformer (PG-SVRT), which employs a spatial-then-temporal attention to effectively reconstruct spectral features from abundant video information, while using a bridged token to reduce computational complexity. Finally, we conduct simulation experiments to assess the performance of four SCI systems, and construct a DD-CASSI prototype for real-world data collection and benchmarking. Extensive experiments demonstrate that PG-SVRT achieves superior performance in reconstruction quality, spectral fidelity, and temporal consistency, while maintaining minimal FLOPs. Project page: https://github.com/nju-cite/DynaSpec
Paper Structure (15 sections, 8 equations, 9 figures, 6 tables)

This paper contains 15 sections, 8 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Spectral compressive imaging and reconstruction. (a) SCI principle. (b) Image-based methods, with issues of uncertain reconstruction and temporal inconsistency (flickering intensity curves). (c) Video-based reconstruction, where information complementarity enhances completeness and temporal consistency (smooth intensity curves).
  • Figure 2: The proposed DynaSpec dataset. (a) Dynamic HSIs sequences acquired frame by frame to simulate the diverse motion of real-world scenarios. (b) A display of the 30 scenes.
  • Figure 3: Illustration of PG-SVRT. (a) and (c) The components of MGDP and CDBP. (b) PG-SVRT framework and key components.
  • Figure 4: Details of the CDPB, which consists primarily of CDPA and MDFFN. (a) CDPA is a spatial-then-temporal attention mechanism, where the blue line represents spatial feature processing and the red line indicates temporal feature processing. (b) Illustration of MDFFN.
  • Figure 5: Measurements of different SCI systems
  • ...and 4 more figures