Table of Contents
Fetching ...

$A^2$GC: $A$symmetric $A$ggregation with Geometric Constraints for Locally Aggregated Descriptors

Zhenyu Li, Tianyi Shang

TL;DR

This work addresses Visual Place Recognition (VPR) under distributional and geometric variability by replacing symmetric optimal transport with an asymmetric aggregation framework and by incorporating geometric constraints. The A^2GC-VPR method combines row-column normalization and independent marginal calibration to adapt to imbalanced feature and cluster distributions, while learnable coordinate embeddings promote spatially coherent assignments via a geometric compatibility score. Empirical results on Pitts30k, Pitts250k, MSLS, Nordland, and SPED show state-of-the-art or competitive performance, with ablations confirming the complementary benefits of asymmetric aggregation and geometric constraints. The approach also demonstrates strong cross-domain generalization and practical efficiency, making it well-suited for real-world VPR deployments. Overall, A^2GC advances VPR by integrating distribution-aware transport with spatially aware feature aggregation, yielding robust and scalable place recognition across diverse conditions.

Abstract

Visual Place Recognition (VPR) aims to match query images against a database using visual cues. State-of-the-art methods aggregate features from deep backbones to form global descriptors. Optimal transport-based aggregation methods reformulate feature-to-cluster assignment as a transport problem, but the standard Sinkhorn algorithm symmetrically treats source and target marginals, limiting effectiveness when image features and cluster centers exhibit substantially different distributions. We propose an asymmetric aggregation VPR method with geometric constraints for locally aggregated descriptors, called $A^2$GC-VPR. Our method employs row-column normalization averaging with separate marginal calibration, enabling asymmetric matching that adapts to distributional discrepancies in visual place recognition. Geometric constraints are incorporated through learnable coordinate embeddings, computing compatibility scores fused with feature similarities, thereby promoting spatially proximal features to the same cluster and enhancing spatial awareness. Experimental results on MSLS, NordLand, and Pittsburgh datasets demonstrate superior performance, validating the effectiveness of our approach in improving matching accuracy and robustness.

$A^2$GC: $A$symmetric $A$ggregation with Geometric Constraints for Locally Aggregated Descriptors

TL;DR

This work addresses Visual Place Recognition (VPR) under distributional and geometric variability by replacing symmetric optimal transport with an asymmetric aggregation framework and by incorporating geometric constraints. The A^2GC-VPR method combines row-column normalization and independent marginal calibration to adapt to imbalanced feature and cluster distributions, while learnable coordinate embeddings promote spatially coherent assignments via a geometric compatibility score. Empirical results on Pitts30k, Pitts250k, MSLS, Nordland, and SPED show state-of-the-art or competitive performance, with ablations confirming the complementary benefits of asymmetric aggregation and geometric constraints. The approach also demonstrates strong cross-domain generalization and practical efficiency, making it well-suited for real-world VPR deployments. Overall, A^2GC advances VPR by integrating distribution-aware transport with spatially aware feature aggregation, yielding robust and scalable place recognition across diverse conditions.

Abstract

Visual Place Recognition (VPR) aims to match query images against a database using visual cues. State-of-the-art methods aggregate features from deep backbones to form global descriptors. Optimal transport-based aggregation methods reformulate feature-to-cluster assignment as a transport problem, but the standard Sinkhorn algorithm symmetrically treats source and target marginals, limiting effectiveness when image features and cluster centers exhibit substantially different distributions. We propose an asymmetric aggregation VPR method with geometric constraints for locally aggregated descriptors, called GC-VPR. Our method employs row-column normalization averaging with separate marginal calibration, enabling asymmetric matching that adapts to distributional discrepancies in visual place recognition. Geometric constraints are incorporated through learnable coordinate embeddings, computing compatibility scores fused with feature similarities, thereby promoting spatially proximal features to the same cluster and enhancing spatial awareness. Experimental results on MSLS, NordLand, and Pittsburgh datasets demonstrate superior performance, validating the effectiveness of our approach in improving matching accuracy and robustness.

Paper Structure

This paper contains 21 sections, 12 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Performance evaluation against state-of-the-art (SOTA) methods based on Pitts30k dataset.
  • Figure 2: The Illustration of $A^2$GC pipeline. $A^2$GC uses DINOv2 to extract local features and a global token. Geometric constraints are integrated by encoding spatial coordinates as learnable embeddings and computing geometric compatibility scores, which are fused with feature similarities. Also, $A^2$GC employs row-column normalization averaging, separate marginal calibration, and determines the transport matrix that assigns features to clusters. The aggregated cluster descriptors are concatenated with the projected global token and normalized to form the final global descriptor.
  • Figure 3: Visualization analysis of feature activation and image matching. We masked activate regions using DINOV2 encodering and asymmetric aggregation. We also utilized cosine similarity to achieve confusion (similarity) matrix. yellow boxes denote correctly matched query–reference pairs and red boxes indicate failures.
  • Figure 4: Qualitative results at challenging datasets. The left column shows several queries, and the right columns show the top-1 candidate retrieved by existing SOTA methods.