Table of Contents
Fetching ...

Evaluation of Visual Place Recognition Methods for Image Pair Retrieval in 3D Vision and Robotics

Dennis Haitz, Athradi Shritish Shetty, Michael Weinmann, Markus Ulrich

Abstract

Visual Place Recognition (VPR) is a core component in computer vision, typically formulated as an image retrieval task for localization, mapping, and navigation. In this work, we instead study VPR as an image pair retrieval front-end for registration pipelines, where the goal is to find top-matching image pairs between two disjoint image sets for downstream tasks such as scene registration, SLAM, and Structure-from-Motion. We comparatively evaluate state-of-the-art VPR families - NetVLAD-style baselines, classification-based global descriptors (CosPlace, EigenPlaces), feature-mixing (MixVPR), and foundation-model-driven methods (AnyLoc, SALAD, MegaLoc) - on three challenging datasets: object-centric outdoor scenes (Tanks and Temples), indoor RGB-D scans (ScanNet-GS), and autonomous-driving sequences (KITTI). We show that modern global descriptor approaches are increasingly suitable as off-the-shelf image pair retrieval modules in challenging scenarios including perceptual aliasing and incomplete sequences, while exhibiting clear, domain-dependent strengths and weaknesses that are critical when choosing VPR components for robust mapping and registration.

Evaluation of Visual Place Recognition Methods for Image Pair Retrieval in 3D Vision and Robotics

Abstract

Visual Place Recognition (VPR) is a core component in computer vision, typically formulated as an image retrieval task for localization, mapping, and navigation. In this work, we instead study VPR as an image pair retrieval front-end for registration pipelines, where the goal is to find top-matching image pairs between two disjoint image sets for downstream tasks such as scene registration, SLAM, and Structure-from-Motion. We comparatively evaluate state-of-the-art VPR families - NetVLAD-style baselines, classification-based global descriptors (CosPlace, EigenPlaces), feature-mixing (MixVPR), and foundation-model-driven methods (AnyLoc, SALAD, MegaLoc) - on three challenging datasets: object-centric outdoor scenes (Tanks and Temples), indoor RGB-D scans (ScanNet-GS), and autonomous-driving sequences (KITTI). We show that modern global descriptor approaches are increasingly suitable as off-the-shelf image pair retrieval modules in challenging scenarios including perceptual aliasing and incomplete sequences, while exhibiting clear, domain-dependent strengths and weaknesses that are critical when choosing VPR components for robust mapping and registration.
Paper Structure (24 sections, 7 equations, 4 figures, 2 tables)

This paper contains 24 sections, 7 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Example images from sets A (left) and B (right), taken from the Caterpillar scene of the Tanks and Temples dataset Knapitsch2017. Images including overlapping scene area are framed in red, the objective in our contribution is to test the ability of VPR methods to retrieve these pairs. Challenging for VPR models in such scenes is the perceptual aliasing, e.g. indicated by the wheels, which are visible on both sides and even appear in similar relative positions.
  • Figure 2: Qualitative top-5 retrieval results for different VPR methods on the T&T scene Barn. This scene is especially prone to perceptual aliasing, as indicated by the brown door, which is at different positions on both sides and the color of the floor in front of the house. Overlap areas are the back of the house, which is completely correctly retrieved by SALAD8192 and MixVPR4096. Columns correspond to methods (a--f) and, within each column, images are shown from rank $k=1$ (top) to $k=5$ (bottom).
  • Figure 3: Qualitative Top-5 retrieval results from CosPlace512 (a) and MegaLoc (b) for KITTI scene 07_02. The challenge here is that some images were left out between $A$ and $B$. This is especially important for tasks such as SLAM relocalization caused from missing frames. CosPlace Berton_CVPR_2022_CosPlace completely fails here, likely because of perceptual aliasing, indicated by two different black cars in $A$ and $B$ and other similarities, whereas MegaLoc Berton_2025_MegaLoc yields only true positives. The cue for correct results is the houses in the center of the $A$ images, which appear towards the left border of the $B$ images. Images are shown from $k=1$ (top) to $k=5$ (bottom) per method.
  • Figure 4: Overview of P@10 (left column) and R@10 (right column) for all VPR methods on T&T (top row), SN-GS (middle row) and KITTI (bottom row).