Applications of Spiking Neural Networks in Visual Place Recognition

Somayeh Hussaini; Michael Milford; Tobias Fischer

Applications of Spiking Neural Networks in Visual Place Recognition

Somayeh Hussaini, Michael Milford, Tobias Fischer

TL;DR

This paper addresses Visual Place Recognition (VPR) using Spiking Neural Networks (SNNs) to achieve energy-efficient, low-latency navigation on neuromorphic hardware. It introduces Modular SNNs to scale place learning, Ensembles of Modular SNNs for robustness, and Sequence Matching to exploit temporal context, demonstrating competitive performance across six datasets and a CPU-based proof-of-concept robot deployment. The results show substantial gains from ensembling and sequence matching, with strong scalability and potential for real-time, energy-constrained robotic applications. The work highlights the viability of SNN-based VPR and outlines a path toward deploying such systems on neuromorphic hardware for loop closure and relocalization in robotics.

Abstract

In robotics, Spiking Neural Networks (SNNs) are increasingly recognized for their largely-unrealized potential energy efficiency and low latency particularly when implemented on neuromorphic hardware. Our paper highlights three advancements for SNNs in Visual Place Recognition (VPR). Firstly, we propose Modular SNNs, where each SNN represents a set of non-overlapping geographically distinct places, enabling scalable networks for large environments. Secondly, we present Ensembles of Modular SNNs, where multiple networks represent the same place, significantly enhancing accuracy compared to single-network models. Each of our Modular SNN modules is compact, comprising only 1500 neurons and 474k synapses, making them ideally suited for ensembling due to their small size. Lastly, we investigate the role of sequence matching in SNN-based VPR, a technique where consecutive images are used to refine place recognition. We demonstrate competitive performance of our method on a range of datasets, including higher responsiveness to ensembling compared to conventional VPR techniques and higher R@1 improvements with sequence matching than VPR techniques with comparable baseline performance. Our contributions highlight the viability of SNNs for VPR, offering scalable and robust solutions, and paving the way for their application in various energy-sensitive robotic tasks.

Applications of Spiking Neural Networks in Visual Place Recognition

TL;DR

Abstract

Paper Structure (57 sections, 17 equations, 10 figures, 2 tables)

This paper contains 57 sections, 17 equations, 10 figures, 2 tables.

Introduction
Related works
Neuromorphic Computing and SNNs
SNNs for Robot Localization
Visual Place Recognition (VPR)
Non-spiking Biologically Inspired VPR Approaches
Ensembles of Neural Networks
Sequence Matching Techniques for Neural Networks
Methodology
Preliminaries
Network Structure
Neuronal Dynamics
Network Connections
Weight Updates
Local Regularization of Excitatory Neurons
...and 42 more sections

Figures (10)

Figure 1: Schematic of the proposed algorithm: The basic building blocks in our work are independent Spiking Neural Network (SNN) modules that learn small subsets of the reference database. At inference time, the place predictions of all these modules are fused in parallel in what we dub a "Standalone Modular SNN", enabling the scalability of our approach to a large number of places. We further make use of the potential massively parallel processing capabilities of neuromorphic processors by introducing ensembles in which multiple Modular SNNs learn representations for the same place, and demonstrate that SNNs are more responsive to ensembling compared to conventional techniques. Finally, we demonstrate the high responsiveness of these Ensembles of Modular SNNs to sequence matching.
Figure 2: SNN Module Architecture: Our Modular SNN is comprised by SNN modules that all have the same three-layer SNN architecture illustrated in this figure. Each module converts an input image to spike trains via rate coding, where the firing rate of input neurons is based on pixel intensities. The total number of input neurons is equal to the number of pixels in the input image, denoted as $K_{P}=W \times H$. These input neurons (blue dots) are fully connected to a layer of excitatory neurons (blue arrows connecting to green dots). Each excitatory neuron is connected to a single inhibitory neuron (single green arrow connecting to a red dot), which in turn connects to and inhibits all other excitatory neurons except its paired excitatory neuron (red arrows connecting back to green dots). The synaptic weights from excitatory to inhibitory neurons, $W_{EI}$, and from inhibitory back to excitatory neurons, $W_{IE}$, are fixed constants. The synaptic weights from input neurons to excitatory neurons, $W_{PE}$, are learned via the unsupervised Spike-Time Dependent Plasticity (STDP) mechanism that enables excitatory neurons to respond to different places. The number of output spikes from these excitatory neurons is used for place predictions.
Figure 3: Sample images from the six VPR datasets employed in our research: These datasets encompass a diverse range of environments including urban locales undergoing seasonal transitions, varying illuminations from day to night, high-glare-induced illumination shifts, scenes with occlusions, railway lines, and forested areas.
Figure 4: Component-wise ablation study: Introducing modularity (Mod) where multiple SNNs represent small subsets of the reference dataset enables large-scale place recognition, significantly outperforming the Non-modular baseline SNN by Hussaini et al. Hussaini2022. Both ensembling (five ensemble members; Mod+Ens) and sequence matching (sequence length four; Mod+Seq) individually enhance the R@1 of the Modular SNN, by 6.3% and 17.2% respectively. Their combined application (Mod+Ens+Seq) further elevates the performance, surpassing the benefits of the individual techniques and resulting in a 24.9% R@1 improvement overall. Error bars indicate performance variations among the five ensemble members (standard deviation). The experiment was conducted on the Nordland dataset (Reference: Spring, Fall; query: Winter).
Figure 5: R@1 performance improvements with sequence matching: The plot shows the mean R@1 performance of each method across all datasets when employing sequence matching using four, seven and ten frames, compared to the single-frame approach (SL1). The gray lines represent the standard deviation of the R@1 model performance across all datasets. Red bars demonstrate the mean R@1 performance improvement of a method without a sequence matcher to the performance of the method with a sequence matcher of sequence length ten. Our Modular and Ensemble of Modular SNNs obtain the highest R@1 improvement with a sequence matcher (from without a sequence matcher to a sequence matcher of sequence length ten). The mean R@1 performance of both our Ensemble of Modular SNNs (with five ensemble members), and Modular SNN without a sequence matcher (SL1) is competitive with multiple VPR techniques, and incorporating a sequence matcher with sequence lengths of four, seven, and ten enables our SNN-based approaches to obtain the highest R@1 improvement compared to similar-performing VPR baselines. Notably, the R@1 performance of our SNN-based approaches with a sequence matcher of sequence length ten frames slightly surpasses that of AP-GeM (ResNet101, LM18), and is in close approximately to that of DenseVLAD, both of which have higher-performing baselines (without a sequence matcher).
...and 5 more figures

Applications of Spiking Neural Networks in Visual Place Recognition

TL;DR

Abstract

Applications of Spiking Neural Networks in Visual Place Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (10)