Table of Contents
Fetching ...

VPRTempo: A Fast Temporally Encoded Spiking Neural Network for Visual Place Recognition

Adam D. Hines, Peter G. Stratton, Michael Milford, Tobias Fischer

TL;DR

A SNN for Visual Place Recognition (VPR) that is both trainable within minutes and queryable in milliseconds, making it well suited for deployment on compute-constrained robotic systems and integrated as a loop closure component for online SLAM on resource-constrained systems such as space and underwater robots.

Abstract

Spiking Neural Networks (SNNs) are at the forefront of neuromorphic computing thanks to their potential energy-efficiency, low latencies, and capacity for continual learning. While these capabilities are well suited for robotics tasks, SNNs have seen limited adaptation in this field thus far. This work introduces a SNN for Visual Place Recognition (VPR) that is both trainable within minutes and queryable in milliseconds, making it well suited for deployment on compute-constrained robotic systems. Our proposed system, VPRTempo, overcomes slow training and inference times using an abstracted SNN that trades biological realism for efficiency. VPRTempo employs a temporal code that determines the timing of a single spike based on a pixel's intensity, as opposed to prior SNNs relying on rate coding that determined the number of spikes; improving spike efficiency by over 100%. VPRTempo is trained using Spike-Timing Dependent Plasticity and a supervised delta learning rule enforcing that each output spiking neuron responds to just a single place. We evaluate our system on the Nordland and Oxford RobotCar benchmark localization datasets, which include up to 27k places. We found that VPRTempo's accuracy is comparable to prior SNNs and the popular NetVLAD place recognition algorithm, while being several orders of magnitude faster and suitable for real-time deployment -- with inference speeds over 50 Hz on CPU. VPRTempo could be integrated as a loop closure component for online SLAM on resource-constrained systems such as space and underwater robots.

VPRTempo: A Fast Temporally Encoded Spiking Neural Network for Visual Place Recognition

TL;DR

A SNN for Visual Place Recognition (VPR) that is both trainable within minutes and queryable in milliseconds, making it well suited for deployment on compute-constrained robotic systems and integrated as a loop closure component for online SLAM on resource-constrained systems such as space and underwater robots.

Abstract

Spiking Neural Networks (SNNs) are at the forefront of neuromorphic computing thanks to their potential energy-efficiency, low latencies, and capacity for continual learning. While these capabilities are well suited for robotics tasks, SNNs have seen limited adaptation in this field thus far. This work introduces a SNN for Visual Place Recognition (VPR) that is both trainable within minutes and queryable in milliseconds, making it well suited for deployment on compute-constrained robotic systems. Our proposed system, VPRTempo, overcomes slow training and inference times using an abstracted SNN that trades biological realism for efficiency. VPRTempo employs a temporal code that determines the timing of a single spike based on a pixel's intensity, as opposed to prior SNNs relying on rate coding that determined the number of spikes; improving spike efficiency by over 100%. VPRTempo is trained using Spike-Timing Dependent Plasticity and a supervised delta learning rule enforcing that each output spiking neuron responds to just a single place. We evaluate our system on the Nordland and Oxford RobotCar benchmark localization datasets, which include up to 27k places. We found that VPRTempo's accuracy is comparable to prior SNNs and the popular NetVLAD place recognition algorithm, while being several orders of magnitude faster and suitable for real-time deployment -- with inference speeds over 50 Hz on CPU. VPRTempo could be integrated as a loop closure component for online SLAM on resource-constrained systems such as space and underwater robots.
Paper Structure (20 sections, 9 equations, 3 figures, 2 tables)

This paper contains 20 sections, 9 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: A-i Image sequences from standard VPR datasets (Nordland, Oxford RobotCar) are filtered and processed to be converted into A-ii spikes where the pixel intensity determines amplitude. A-iii In order to temporally encode spikes to an abstracted theta oscillation, amplitudes determine the spike timing during a timestep. B-i Once spikes have been generated, they are passed into a SNN with 3 layers; an input, feature layer, and a one-hot encoded output layer where each output neuron represents one place. B-ii In order to scale the system for large datasets, we train individual expert module SNNs of up to 1000 places from subsets of an entire reference dataset.
  • Figure 2: A Comparison of training times for 3300 places from the Nordland dataset our system against state-of-the-art (VPRSNN hussaini2022ensembles). VPRSNN trained in 360 mins, VPRTempo on a CPU trained in 60 mins, with our best performance of $\approx$1 min when VPRTempo runs on a GPU. B Querying speed 2700 places: VPRSNN 2 Hz, VPRTempo CPU at 353 Hz, and VPRTempo GPU queried at 1634 Hz. C Increasing the number of places scales the training time with a time complexity of $\mathcal{O}(n)$ and inference time increases with a time complexity of $\mathcal{O}(\log\! n)$.
  • Figure 3: A Example database and query image from the Nordland dataset (left) that are patch-normalized for network training (right). B Ground truth (GT, left) and descriptor similarity matrix from testing 3300 query images (right). C Precision-recall curves comparing our network with sum of absolute differences (SAD 6224623), NetVLAD Arandjelovic16, Generalized Contrastive Loss (GCL leyvavallina2021gcl), and VPRSNN hussaini2022ensembles. D Recall at N curves comparing methods as in C.