Table of Contents
Fetching ...

Finding Visual Saliency in Continuous Spike Stream

Lin Zhu, Xianzhang Chen, Xiao Wang, Hua Huang

TL;DR

This framework exhibits a substantial margin of improvement in capturing and highlighting visual saliency in the spike stream, which not only provides a new perspective for spike-based saliency segmentation but also shows a new paradigm for full SNN-based transformer models.

Abstract

As a bio-inspired vision sensor, the spike camera emulates the operational principles of the fovea, a compact retinal region, by employing spike discharges to encode the accumulation of per-pixel luminance intensity. Leveraging its high temporal resolution and bio-inspired neuromorphic design, the spike camera holds significant promise for advancing computer vision applications. Saliency detection mimics the behavior of human beings and captures the most salient region from the scenes. In this paper, we investigate the visual saliency in the continuous spike stream for the first time. To effectively process the binary spike stream, we propose a Recurrent Spiking Transformer (RST) framework, which is based on a full spiking neural network. Our framework enables the extraction of spatio-temporal features from the continuous spatio-temporal spike stream while maintaining low power consumption. To facilitate the training and validation of our proposed model, we build a comprehensive real-world spike-based visual saliency dataset, enriched with numerous light conditions. Extensive experiments demonstrate the superior performance of our Recurrent Spiking Transformer framework in comparison to other spike neural network-based methods. Our framework exhibits a substantial margin of improvement in capturing and highlighting visual saliency in the spike stream, which not only provides a new perspective for spike-based saliency segmentation but also shows a new paradigm for full SNN-based transformer models. The code and dataset are available at \url{https://github.com/BIT-Vision/SVS}.

Finding Visual Saliency in Continuous Spike Stream

TL;DR

This framework exhibits a substantial margin of improvement in capturing and highlighting visual saliency in the spike stream, which not only provides a new perspective for spike-based saliency segmentation but also shows a new paradigm for full SNN-based transformer models.

Abstract

As a bio-inspired vision sensor, the spike camera emulates the operational principles of the fovea, a compact retinal region, by employing spike discharges to encode the accumulation of per-pixel luminance intensity. Leveraging its high temporal resolution and bio-inspired neuromorphic design, the spike camera holds significant promise for advancing computer vision applications. Saliency detection mimics the behavior of human beings and captures the most salient region from the scenes. In this paper, we investigate the visual saliency in the continuous spike stream for the first time. To effectively process the binary spike stream, we propose a Recurrent Spiking Transformer (RST) framework, which is based on a full spiking neural network. Our framework enables the extraction of spatio-temporal features from the continuous spatio-temporal spike stream while maintaining low power consumption. To facilitate the training and validation of our proposed model, we build a comprehensive real-world spike-based visual saliency dataset, enriched with numerous light conditions. Extensive experiments demonstrate the superior performance of our Recurrent Spiking Transformer framework in comparison to other spike neural network-based methods. Our framework exhibits a substantial margin of improvement in capturing and highlighting visual saliency in the spike stream, which not only provides a new perspective for spike-based saliency segmentation but also shows a new paradigm for full SNN-based transformer models. The code and dataset are available at \url{https://github.com/BIT-Vision/SVS}.
Paper Structure (12 sections, 13 equations, 7 figures, 6 tables)

This paper contains 12 sections, 13 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: The motivation of detecting visual saliency in continuous spike stream. In contrast to ANNs, SNNs provide a biologically realistic model where neurons communicate through discrete spikes, making them well-suited for processing spike data with low power consumption.
  • Figure 2: Visual saliency in spatio-temporal spike stream.
  • Figure 3: Samples in our spike-based visual saliency (SVS) dataset.
  • Figure 4: The framework of our Recurrent Spiking Transformer (RST). Our recurrent spiking Transformer is a full spiking neural network architecture, which comprises spike-based spatio-temporal feature extraction, recurrent feature aggregation, multi-scale refinement, and multi-step loss.
  • Figure 5: Recurrent mode of our RFA module. RFA uses attention mechanism to aggregate the adjacent step features $\rm{E}_{t}$ and $\rm{E}_{t+1}$, which will enhance the feature and generate a better saliency map at the step $\rm{t}$.
  • ...and 2 more figures