Table of Contents
Fetching ...

Structured Pruning for Efficient Visual Place Recognition

Oliver Grainge, Michael Milford, Indu Bodala, Sarvapali D. Ramchurn, Shoaib Ehsan

TL;DR

This work introduces a novel structured pruning method, to not only streamline common VPR architectures but also to strategically remove redundancies within the feature embedding space, which significantly enhances the efficiency of the system, reducing both map and model memory requirements and decreasing feature extraction and retrieval latencies.

Abstract

Visual Place Recognition (VPR) is fundamental for the global re-localization of robots and devices, enabling them to recognize previously visited locations based on visual inputs. This capability is crucial for maintaining accurate mapping and localization over large areas. Given that VPR methods need to operate in real-time on embedded systems, it is critical to optimize these systems for minimal resource consumption. While the most efficient VPR approaches employ standard convolutional backbones with fixed descriptor dimensions, these often lead to redundancy in the embedding space as well as in the network architecture. Our work introduces a novel structured pruning method, to not only streamline common VPR architectures but also to strategically remove redundancies within the feature embedding space. This dual focus significantly enhances the efficiency of the system, reducing both map and model memory requirements and decreasing feature extraction and retrieval latencies. Our approach has reduced memory usage and latency by 21% and 16%, respectively, across models, while minimally impacting recall@1 accuracy by less than 1%. This significant improvement enhances real-time applications on edge devices with negligible accuracy loss.

Structured Pruning for Efficient Visual Place Recognition

TL;DR

This work introduces a novel structured pruning method, to not only streamline common VPR architectures but also to strategically remove redundancies within the feature embedding space, which significantly enhances the efficiency of the system, reducing both map and model memory requirements and decreasing feature extraction and retrieval latencies.

Abstract

Visual Place Recognition (VPR) is fundamental for the global re-localization of robots and devices, enabling them to recognize previously visited locations based on visual inputs. This capability is crucial for maintaining accurate mapping and localization over large areas. Given that VPR methods need to operate in real-time on embedded systems, it is critical to optimize these systems for minimal resource consumption. While the most efficient VPR approaches employ standard convolutional backbones with fixed descriptor dimensions, these often lead to redundancy in the embedding space as well as in the network architecture. Our work introduces a novel structured pruning method, to not only streamline common VPR architectures but also to strategically remove redundancies within the feature embedding space. This dual focus significantly enhances the efficiency of the system, reducing both map and model memory requirements and decreasing feature extraction and retrieval latencies. Our approach has reduced memory usage and latency by 21% and 16%, respectively, across models, while minimally impacting recall@1 accuracy by less than 1%. This significant improvement enhances real-time applications on edge devices with negligible accuracy loss.
Paper Structure (18 sections, 2 equations, 7 figures, 1 table, 3 algorithms)

This paper contains 18 sections, 2 equations, 7 figures, 1 table, 3 algorithms.

Figures (7)

  • Figure 1: Structured Pruning of Convolution Visual Place Recognition Networks. In grey is the pruned backbone filters, which once removed simultaneously reduce the backbone size and descriptor dimension.
  • Figure 2: Linear Pruning Schedule Overview. This schedule shows the backbone pruning schedule ending with a final sparisty of 0.9. The aggregation pruning hyper-parameter $\gamma$ represents the final aggregation and descriptor sparsity, regulating the sparsity ratio between the network's backbone and the descriptor throughout each step of the pruning process.
  • Figure 3: Total memory of the VPR system including the sum of the model and map embedding consumptions against the recall@1 score. The curves are created by iterative magnitude pruning of the feature extraction network.
  • Figure 4: Total latency of the VPR system including the feature extraction and matching latencyies of a single image against the recall@1 score. The curves are created by iterative magnitude pruning of the feature extraction network.
  • Figure 5: Efficiency-recall@1 trade-off curves for the pruned ConvAP VPR method gsvcities on the on Pitts30k Validation dataset. The hyper-parameter $\gamma$ shows the effect of altering the pruning ratio between the backbone and feature aggregation.
  • ...and 2 more figures