Table of Contents
Fetching ...

SpikeReveal: Unlocking Temporal Sequences from Real Blurry Inputs with Spike Streams

Kang Chen, Shiyan Chen, Jiyuan Zhang, Baoyue Zhang, Yajing Zheng, Tiejun Huang, Zhaofei Yu

TL;DR

A self-supervised cascaded framework to alleviate the issues of spike noise and spatial-resolution mismatching encountered in the deblurring model is developed and a lightweight deblur network is designed to generate high-quality sequences with brightness and texture consistency with the original input.

Abstract

Reconstructing a sequence of sharp images from the blurry input is crucial for enhancing our insights into the captured scene and poses a significant challenge due to the limited temporal features embedded in the image. Spike cameras, sampling at rates up to 40,000 Hz, have proven effective in capturing motion features and beneficial for solving this ill-posed problem. Nonetheless, existing methods fall into the supervised learning paradigm, which suffers from notable performance degradation when applied to real-world scenarios that diverge from the synthetic training data domain. Moreover, the quality of reconstructed images is capped by the generated images based on motion analysis interpolation, which inherently differs from the actual scene, affecting the generalization ability of these methods in real high-speed scenarios. To address these challenges, we propose the first self-supervised framework for the task of spike-guided motion deblurring. Our approach begins with the formulation of a spike-guided deblurring model that explores the theoretical relationships among spike streams, blurry images, and their corresponding sharp sequences. We subsequently develop a self-supervised cascaded framework to alleviate the issues of spike noise and spatial-resolution mismatching encountered in the deblurring model. With knowledge distillation and re-blurring loss, we further design a lightweight deblur network to generate high-quality sequences with brightness and texture consistency with the original input. Quantitative and qualitative experiments conducted on our real-world and synthetic datasets with spikes validate the superior generalization of the proposed framework. Our code, data and trained models will be available at \url{https://github.com/chenkang455/S-SDM}.

SpikeReveal: Unlocking Temporal Sequences from Real Blurry Inputs with Spike Streams

TL;DR

A self-supervised cascaded framework to alleviate the issues of spike noise and spatial-resolution mismatching encountered in the deblurring model is developed and a lightweight deblur network is designed to generate high-quality sequences with brightness and texture consistency with the original input.

Abstract

Reconstructing a sequence of sharp images from the blurry input is crucial for enhancing our insights into the captured scene and poses a significant challenge due to the limited temporal features embedded in the image. Spike cameras, sampling at rates up to 40,000 Hz, have proven effective in capturing motion features and beneficial for solving this ill-posed problem. Nonetheless, existing methods fall into the supervised learning paradigm, which suffers from notable performance degradation when applied to real-world scenarios that diverge from the synthetic training data domain. Moreover, the quality of reconstructed images is capped by the generated images based on motion analysis interpolation, which inherently differs from the actual scene, affecting the generalization ability of these methods in real high-speed scenarios. To address these challenges, we propose the first self-supervised framework for the task of spike-guided motion deblurring. Our approach begins with the formulation of a spike-guided deblurring model that explores the theoretical relationships among spike streams, blurry images, and their corresponding sharp sequences. We subsequently develop a self-supervised cascaded framework to alleviate the issues of spike noise and spatial-resolution mismatching encountered in the deblurring model. With knowledge distillation and re-blurring loss, we further design a lightweight deblur network to generate high-quality sequences with brightness and texture consistency with the original input. Quantitative and qualitative experiments conducted on our real-world and synthetic datasets with spikes validate the superior generalization of the proposed framework. Our code, data and trained models will be available at \url{https://github.com/chenkang455/S-SDM}.
Paper Structure (24 sections, 17 equations, 18 figures, 5 tables)

This paper contains 24 sections, 17 equations, 18 figures, 5 tables.

Figures (18)

  • Figure 1: Illustration of the superiority of our self-supervised framework (S-SDM) over supervised methods. Supervised methods, while effective on synthetic datasets, suffer from a significant performance decline when applied to real-world datasets, primarily due to data distribution discrepancies. In contrast, our self-supervised framework, necessitating no Ground Truth (GT) for training, seamlessly bridges this dataset gap through fine-tuning on real-world datasets.
  • Figure 2: The schematic diagram of our proposed distillation self-supervised framework. The "" indicates that certain computations are executed in a non-network manner.
  • Figure 3: Qualitative comparison for the single frame restoration on the RSB dataset.
  • Figure 4: Qualitative comparison for the sequence reconstruction on the RSB dataset.
  • Figure 5: Visual comparison of our S-SDM against other methods on the GOPRO dataset.
  • ...and 13 more figures