Table of Contents
Fetching ...

Seeing the Unseen in Low-light Spike Streams

Liwen Hu, Yang Li, Mianzhi Liu, Yijia Guo, Shenghao Xie, Ziluo Ding, Tiejun Huang, Lei Ma

TL;DR

Diff-SPK addresses the reconstruction of low-light, high-speed spike streams using a diffusion-based pipeline conditioned on Enhanced Texture from Inter-spike Interval (ETFI). By introducing an ETFI encoding and a fusion module, Diff-SPK integrates temporal spike information into a Latent Diffusion Model with ControlNet, enabling high-fidelity texture synthesis in challenging lighting. The approach is validated on a large SA_SPK dataset and across synthetic and real spike-camera data, showing clear improvements over traditional methods and prior diffusion-based approaches, particularly under very dark conditions. This work also provides a first bona fide benchmark for low-light spike-stream reconstruction, highlighting strong generalization to different spike-camera variants and practical relevance for high-speed vision tasks.

Abstract

Spike camera, a type of neuromorphic sensor with high-temporal resolution, shows great promise for high-speed visual tasks. Unlike traditional cameras, spike camera continuously accumulates photons and fires asynchronous spike streams. Due to unique data modality, spike streams require reconstruction methods to become perceptible to the human eye. However, lots of methods struggle to handle spike streams in low-light high-speed scenarios due to severe noise and sparse information. In this work, we propose Diff-SPK, a diffusion-based reconstruction method. Diff-SPK effectively leverages generative priors to supplement texture information under diverse low-light conditions. Specifically, it first employs an Enhanced Texture from Inter-spike Interval (ETFI) to aggregate sparse information from low-light spike streams. Then, the encoded ETFI by a suitable encoder serve as the input of ControlNet for high-speed scenes generation. To improve the quality of results, we introduce an ETFI-based feature fusion module during the generation process.

Seeing the Unseen in Low-light Spike Streams

TL;DR

Diff-SPK addresses the reconstruction of low-light, high-speed spike streams using a diffusion-based pipeline conditioned on Enhanced Texture from Inter-spike Interval (ETFI). By introducing an ETFI encoding and a fusion module, Diff-SPK integrates temporal spike information into a Latent Diffusion Model with ControlNet, enabling high-fidelity texture synthesis in challenging lighting. The approach is validated on a large SA_SPK dataset and across synthetic and real spike-camera data, showing clear improvements over traditional methods and prior diffusion-based approaches, particularly under very dark conditions. This work also provides a first bona fide benchmark for low-light spike-stream reconstruction, highlighting strong generalization to different spike-camera variants and practical relevance for high-speed vision tasks.

Abstract

Spike camera, a type of neuromorphic sensor with high-temporal resolution, shows great promise for high-speed visual tasks. Unlike traditional cameras, spike camera continuously accumulates photons and fires asynchronous spike streams. Due to unique data modality, spike streams require reconstruction methods to become perceptible to the human eye. However, lots of methods struggle to handle spike streams in low-light high-speed scenarios due to severe noise and sparse information. In this work, we propose Diff-SPK, a diffusion-based reconstruction method. Diff-SPK effectively leverages generative priors to supplement texture information under diverse low-light conditions. Specifically, it first employs an Enhanced Texture from Inter-spike Interval (ETFI) to aggregate sparse information from low-light spike streams. Then, the encoded ETFI by a suitable encoder serve as the input of ControlNet for high-speed scenes generation. To improve the quality of results, we introduce an ETFI-based feature fusion module during the generation process.

Paper Structure

This paper contains 20 sections, 11 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Analysis of low-light high-speed spike stream datasets and performance comparison of methods. SA$_{\textit{SPK}}^{{T}}$ (Outdoor$_{\textit{SPK}}$$\&$ Indoor$_{\textit{SPK}}$) is proposed synthetic (real) dataset. Details are in Sec.\ref{['sec:data']}. (a) Average Spike Firing Frequency (ASFF): The darker the scene, the lower the ASFF. Our datasets have darker illumination conditions compared to LLR rec8 and Dong et al. lowlighrec0. (b) Dataset size comparison: Ours are $\sim$30 $\times$ larger than others. (c) Performance comparison: Our method achieves the best overall performance, demonstrating particular advantages in scenes with lower illumination. TFSTP rec2, Hu et al. rec8$\&$Liu et al. rec9, and BSF rec7 represent the state-of-the-art for traditional, deep learning-based low-light, and deep learning-based normal-light reconstruction.
  • Figure 2: Illustration of spike camera model. (A) Input current. (B) Accumulation. Low-light intensity leads to long intervals between adjacent spike, i.e., sparse information.
  • Figure 3: Framework of Diff-SPK. Diff-SPK first uses ETFI to compute and enhance light intensity information from a spike stream. The condition is through encoding (a) before being fed into ControlNet. To improve the guidance on denoising process, we introduce a fusion module (b).
  • Figure 4: The influence of VAE encoder on ETFI. The difference map describes the Structural Similarity Index Measure (SSIM) of local areas between (a-b).
  • Figure 5: Instructions for obtaining illumination in Indoor$_{\textit{SPK}}$.
  • ...and 2 more figures