Table of Contents
Fetching ...

Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization

Dror Aiger, André Araujo, Simon Lynen

TL;DR

This work introduces Constrained Approximate Nearest Neighbors (CANN), a method to jointly search for appearance- and geometry-consistent local-feature matches for visual localization without relying on global image embeddings. CANN defines a camera-ranking framework that constrains nearest neighbors by image IDs, and provides two efficient implementations, CANN-RS and CANN-RG, based on colored range searching and Random Grids. The authors establish a theoretical foundation and demonstrate through extensive experiments on four large-scale datasets that local-feature-based retrieval via CANN outperforms state-of-the-art global approaches while remaining fast. This approach offers a practical, scalable alternative for local-feature-driven localization with potential to reshape retrieval pipelines in large-scale 3D models.

Abstract

Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localization have thus proposed a hybrid method, where first a global (per image) embedding is used to retrieve a small subset of database images, and local features of the query are matched only against those. It seems to have become common belief that global embeddings are critical for said image-retrieval in visual localization, despite the significant downside of having to compute two feature types for each query image. In this paper, we take a step back from this assumption and propose Constrained Approximate Nearest Neighbors (CANN), a joint solution of k-nearest-neighbors across both the geometry and appearance space using only local features. We first derive the theoretical foundation for k-nearest-neighbor retrieval across multiple metrics and then showcase how CANN improves visual localization. Our experiments on public localization benchmarks demonstrate that our method significantly outperforms both state-of-the-art global feature-based retrieval and approaches using local feature aggregation schemes. Moreover, it is an order of magnitude faster in both index and query time than feature aggregation schemes for these datasets. Code: \url{https://github.com/google-research/google-research/tree/master/cann}

Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization

TL;DR

This work introduces Constrained Approximate Nearest Neighbors (CANN), a method to jointly search for appearance- and geometry-consistent local-feature matches for visual localization without relying on global image embeddings. CANN defines a camera-ranking framework that constrains nearest neighbors by image IDs, and provides two efficient implementations, CANN-RS and CANN-RG, based on colored range searching and Random Grids. The authors establish a theoretical foundation and demonstrate through extensive experiments on four large-scale datasets that local-feature-based retrieval via CANN outperforms state-of-the-art global approaches while remaining fast. This approach offers a practical, scalable alternative for local-feature-driven localization with potential to reshape retrieval pipelines in large-scale 3D models.

Abstract

Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localization have thus proposed a hybrid method, where first a global (per image) embedding is used to retrieve a small subset of database images, and local features of the query are matched only against those. It seems to have become common belief that global embeddings are critical for said image-retrieval in visual localization, despite the significant downside of having to compute two feature types for each query image. In this paper, we take a step back from this assumption and propose Constrained Approximate Nearest Neighbors (CANN), a joint solution of k-nearest-neighbors across both the geometry and appearance space using only local features. We first derive the theoretical foundation for k-nearest-neighbor retrieval across multiple metrics and then showcase how CANN improves visual localization. Our experiments on public localization benchmarks demonstrate that our method significantly outperforms both state-of-the-art global feature-based retrieval and approaches using local feature aggregation schemes. Moreover, it is an order of magnitude faster in both index and query time than feature aggregation schemes for these datasets. Code: \url{https://github.com/google-research/google-research/tree/master/cann}
Paper Structure (25 sections, 1 equation, 9 figures, 3 tables, 2 algorithms)

This paper contains 25 sections, 1 equation, 9 figures, 3 tables, 2 algorithms.

Figures (9)

  • Figure 1: The proposed Constrained Approximate Nearest Neighbor algorithm allows to find the best subset of 3D points that are both close to query features in appearance space and that are consistently seen by the same camera, leading to high overlap with the initially unknown query camera pose (shaded area). Jointly solving for these two metrics in a single search algorithm is a long-known open question in the community and CANN provides to the best of our knowledge the first practical solution. Red points in the figure show neighbors retrieved by an unconstrained search using the features from the query image (bottom right). Using CANN it's more likely to retrieve points that are inliers to geometric verification (green) and less likely to fetch unrelated outlier points (yellow).
  • Figure 2: A visual depiction of CANN: the image on the left shows 3D points colored by the camera from which they were reconstructed. CANN leverages this information to retrieve feature matches that are consistently seen in the same camera. This contrasts with prior art (on the right), where unconstrainted feature matching returns many unrelated outlier matches (red), which then need to be filtered out subsequently by geometric verification to obtain inlier matches (green).
  • Figure 3: Our score for $R=1$ and various $p$ different values in Equation \ref{['alg:query-r']}. $p$ is a parameter of our metric that we tune upfront and is used to compute $s_i$ for all $d_{i,j}$.
  • Figure 4: Robotcar
  • Figure 5: Gangnam
  • ...and 4 more figures