Table of Contents
Fetching ...

Spike-EVPR: Deep Spiking Residual Networks with SNN-Tailored Representations for Event-Based Visual Place Recognition

Zuntao Liu, Yaohui Li, Chenming Hu, Delei Kong, Junjie Jiang, Zheng Fang

TL;DR

Spike-EVPR introduces spike-compatible representations and a deep spiking residual architecture to tackle event-based visual place recognition. By employing MCS-Tensor and TSS-Tensor representations, a BSR-Encoder, SSD-Extractor, and CDA-Module, the method learns robust global descriptors end-to-end with triplet supervision. The approach achieves state-of-the-art results on Brisbane-Event-VPR and DDD20 while delivering substantial energy savings compared with ANN baselines and prior SNN methods. These findings demonstrate the practicality of energy-efficient, end-to-end SNNs for large-scale EVPR tasks. The work also provides valuable insights into representation learning for SNNs in spatio-temporal event data and sets a strong foundation for neuromorphic deployment in place recognition.

Abstract

Event cameras are ideal for visual place recognition (VPR) in challenging environments due to their high temporal resolution and high dynamic range. However, existing methods convert sparse events into dense frame-like representations for Artificial Neural Networks (ANNs), ignoring event sparsity and incurring high computational cost. Spiking Neural Networks (SNNs) complement event data through discrete spike signals to enable energy-efficient VPR, but their application is hindered by the lack of effective spike-compatible representations and deep architectures capable of learning discriminative global descriptors. To address these limitations, we propose Spike-EVPR, a directly trained, end-to-end SNN framework tailored for event-based VPR. First, we introduce two complementary event representations, MCS-Tensor and TSS-Tensor, designed to reduce temporal redundancy while preserving essential spatio-temporal cues. Furthermore, we propose a deep spiking residual architecture that effectively aggregates these features to generate robust place descriptors. Extensive experiments on the Brisbane-Event-VPR and DDD20 datasets demonstrate that Spike-EVPR achieves state-of-the-art performance, improving Recall@1 by 7.61% and 13.20%, respectively, while significantly reducing energy consumption.

Spike-EVPR: Deep Spiking Residual Networks with SNN-Tailored Representations for Event-Based Visual Place Recognition

TL;DR

Spike-EVPR introduces spike-compatible representations and a deep spiking residual architecture to tackle event-based visual place recognition. By employing MCS-Tensor and TSS-Tensor representations, a BSR-Encoder, SSD-Extractor, and CDA-Module, the method learns robust global descriptors end-to-end with triplet supervision. The approach achieves state-of-the-art results on Brisbane-Event-VPR and DDD20 while delivering substantial energy savings compared with ANN baselines and prior SNN methods. These findings demonstrate the practicality of energy-efficient, end-to-end SNNs for large-scale EVPR tasks. The work also provides valuable insights into representation learning for SNNs in spatio-temporal event data and sets a strong foundation for neuromorphic deployment in place recognition.

Abstract

Event cameras are ideal for visual place recognition (VPR) in challenging environments due to their high temporal resolution and high dynamic range. However, existing methods convert sparse events into dense frame-like representations for Artificial Neural Networks (ANNs), ignoring event sparsity and incurring high computational cost. Spiking Neural Networks (SNNs) complement event data through discrete spike signals to enable energy-efficient VPR, but their application is hindered by the lack of effective spike-compatible representations and deep architectures capable of learning discriminative global descriptors. To address these limitations, we propose Spike-EVPR, a directly trained, end-to-end SNN framework tailored for event-based VPR. First, we introduce two complementary event representations, MCS-Tensor and TSS-Tensor, designed to reduce temporal redundancy while preserving essential spatio-temporal cues. Furthermore, we propose a deep spiking residual architecture that effectively aggregates these features to generate robust place descriptors. Extensive experiments on the Brisbane-Event-VPR and DDD20 datasets demonstrate that Spike-EVPR achieves state-of-the-art performance, improving Recall@1 by 7.61% and 13.20%, respectively, while significantly reducing energy consumption.
Paper Structure (27 sections, 13 equations, 6 figures, 6 tables)

This paper contains 27 sections, 13 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Overview of our Spike-EVPR pipeline.
  • Figure 2: Architecture of Spike-EVPR. The event volume is first converted into twon spike-compatible representations: MCS-Tensor and TSS-Tensor. These are encoded by the Bifurcated Spike Residual Encoder (BSR-Encoder) into feature maps. The Shared & Specific Descriptor Extractor (SSD-Extractor) then generates three complementary sub-descriptors, which are subsequently aggregated by Cross-Descriptor Aggregation Module (CDA-Module) to produce the robust global descriptor.
  • Figure 3: Detailed illustration of our SNN-Tailored event representations.
  • Figure 4: PR curves and Recall@N comparisons on Brisbane-Event-VPR (first four) and DDD20 (last two) datasets. Our Spike-EVPR (in red) consistently outperforms existing SOTA EVPR methods across most scenes.
  • Figure 5: Impact of different geographic distance thresholds.
  • ...and 1 more figures