Table of Contents
Fetching ...

3One2: One-step Regression Plus One-step Diffusion for One-hot Modulation in Dual-path Video Snapshot Compressive Imaging

Ge Wang, Xing Liu, Xin Yuan

TL;DR

This paper addresses temporal aliasing in video snapshot compressive imaging by leveraging one hot modulation to decouple frames. It introduces RegDif, a hybrid reconstruction framework that combines one step regression with one step diffusion, guided by a forward SDE aligned with hardware encoding, and augments it with a dual optical path to recover complementary information. The method demonstrates superior reconstruction performance on simulated grayscale/color datasets and real scenes compared with state-of-the-art baselines. This work provides a diffusion-based solution tailored to one hot masks in video SCI, enabling faster, more reliable high-speed video recovery.

Abstract

Video snapshot compressive imaging (SCI) captures dynamic scene sequences through a two-dimensional (2D) snapshot, fundamentally relying on optical modulation for hardware compression and the corresponding software reconstruction. While mainstream video SCI using random binary modulation has demonstrated success, it inevitably results in temporal aliasing during compression. One-hot modulation, activating only one sub-frame per pixel, provides a promising solution for achieving perfect temporal decoupling, thereby alleviating issues associated with aliasing. However, no algorithms currently exist to fully exploit this potential. To bridge this gap, we propose an algorithm specifically designed for one-hot masks. First, leveraging the decoupling properties of one-hot modulation, we transform the reconstruction task into a generative video inpainting problem and introduce a stochastic differential equation (SDE) of the forward process that aligns with the hardware compression process. Next, we identify limitations of the pure diffusion method for video SCI and propose a novel framework that combines one-step regression initialization with one-step diffusion refinement. Furthermore, to mitigate the spatial degradation caused by one-hot modulation, we implement a dual optical path at the hardware level, utilizing complementary information from another path to enhance the inpainted video. To our knowledge, this is the first work integrating diffusion into video SCI reconstruction. Experiments conducted on synthetic datasets and real scenes demonstrate the effectiveness of our method.

3One2: One-step Regression Plus One-step Diffusion for One-hot Modulation in Dual-path Video Snapshot Compressive Imaging

TL;DR

This paper addresses temporal aliasing in video snapshot compressive imaging by leveraging one hot modulation to decouple frames. It introduces RegDif, a hybrid reconstruction framework that combines one step regression with one step diffusion, guided by a forward SDE aligned with hardware encoding, and augments it with a dual optical path to recover complementary information. The method demonstrates superior reconstruction performance on simulated grayscale/color datasets and real scenes compared with state-of-the-art baselines. This work provides a diffusion-based solution tailored to one hot masks in video SCI, enabling faster, more reliable high-speed video recovery.

Abstract

Video snapshot compressive imaging (SCI) captures dynamic scene sequences through a two-dimensional (2D) snapshot, fundamentally relying on optical modulation for hardware compression and the corresponding software reconstruction. While mainstream video SCI using random binary modulation has demonstrated success, it inevitably results in temporal aliasing during compression. One-hot modulation, activating only one sub-frame per pixel, provides a promising solution for achieving perfect temporal decoupling, thereby alleviating issues associated with aliasing. However, no algorithms currently exist to fully exploit this potential. To bridge this gap, we propose an algorithm specifically designed for one-hot masks. First, leveraging the decoupling properties of one-hot modulation, we transform the reconstruction task into a generative video inpainting problem and introduce a stochastic differential equation (SDE) of the forward process that aligns with the hardware compression process. Next, we identify limitations of the pure diffusion method for video SCI and propose a novel framework that combines one-step regression initialization with one-step diffusion refinement. Furthermore, to mitigate the spatial degradation caused by one-hot modulation, we implement a dual optical path at the hardware level, utilizing complementary information from another path to enhance the inpainted video. To our knowledge, this is the first work integrating diffusion into video SCI reconstruction. Experiments conducted on synthetic datasets and real scenes demonstrate the effectiveness of our method.

Paper Structure

This paper contains 14 sections, 10 equations, 8 figures, 2 tables, 2 algorithms.

Figures (8)

  • Figure 1: (a) The forward process in our diffusion-base inpainting aligns with the hardware compression process in video SCI. (b) Our method shows superiority in both single and dual-path settings. The 3D heatmap represents the absolute error between the red block and the ground truth.
  • Figure 2: (a) Overall framework of a single-branch system using a random binary mask and its measurement pixel histogram. (b) Overall framework of a dual-branch system using a one-hot mask and its measurement pixel histogram.
  • Figure 3: The normalized $\ell_2$-norm distance between ${\boldsymbol X}(t)$ and ${\boldsymbol X}(0)$ as t varies from 0 to 1 in our forward process.
  • Figure 4: Illustration of the overall RegDif framework (LHS) and key components in RegDif (RHS). In the LHS, black solid line indicates the pipeline of RegDif, gray solid line indicates the reverse process of pure diffusion inpainting. Three colored dash lines indicate pipelines for different loss derivations. The RHS shows architectures of key components mentioned in RegDif.
  • Figure 5: Detailed architectures of the Spatial-temporal ResBlock, the STHB block, and the Timestep Embedding block.
  • ...and 3 more figures