Table of Contents
Fetching ...

A Deeper Look into Second-Order Feature Aggregation for LiDAR Place Recognition

Saimunur Rahman, Peyman Moghadam

TL;DR

The paper addresses the inefficiency of first-order pooling in LiDAR place recognition by capturing second-order feature correlations through a scalable approach. It introduces Channel Partition-based Second-order Local Feature Aggregation (CPS), which partitions channels, computes per-group covariances, applies Newton-Schulz normalization, and fuses upper-triangular statistics into a compact descriptor via a learnable weighted sum. CPS achieves state-of-the-art results on four large-scale LPR benchmarks while reducing descriptor dimensionality by 4–16x compared to full covariance, and demonstrates backbone-agnostic robustness when integrated with MinkLoc3D and MinkLoc3Dv2. The findings highlight CPS as a practical, memory-efficient alternative to full covariance that preserves discriminative second-order information, with potential for future integration with transformer backbones and broader LPR datasets.

Abstract

Efficient LiDAR Place Recognition (LPR) compresses dense pointwise features into compact global descriptors. While first-order aggregators such as GeM and NetVLAD are widely used, they overlook inter-feature correlations that second-order aggregation naturally captures. Full covariance, a common second-order aggregator, is high in dimensionality; as a result, practitioners often insert a learned projection or employ random sketches -- both of which either sacrifice information or increase parameter count. However, no prior work has systematically investigated how first- and second-order aggregation perform under constrained feature and compute budgets. In this paper, we first demonstrate that second-order aggregation retains its superiority for LPR even when channels are pruned and backbone parameters are reduced. Building on this insight, we propose Channel Partition-based Second-order Local Feature Aggregation (CPS): a drop-in, partition-based second-order aggregation module that preserves all channels while producing an order-of-magnitude smaller descriptor. CPS matches or exceeds the performance of full covariance and outperforms random projection variants, delivering new state-of-the-art results with only four additional learnable parameters across four large-scale benchmarks: Oxford RobotCar, In-house, MulRan, and WildPlaces.

A Deeper Look into Second-Order Feature Aggregation for LiDAR Place Recognition

TL;DR

The paper addresses the inefficiency of first-order pooling in LiDAR place recognition by capturing second-order feature correlations through a scalable approach. It introduces Channel Partition-based Second-order Local Feature Aggregation (CPS), which partitions channels, computes per-group covariances, applies Newton-Schulz normalization, and fuses upper-triangular statistics into a compact descriptor via a learnable weighted sum. CPS achieves state-of-the-art results on four large-scale LPR benchmarks while reducing descriptor dimensionality by 4–16x compared to full covariance, and demonstrates backbone-agnostic robustness when integrated with MinkLoc3D and MinkLoc3Dv2. The findings highlight CPS as a practical, memory-efficient alternative to full covariance that preserves discriminative second-order information, with potential for future integration with transformer backbones and broader LPR datasets.

Abstract

Efficient LiDAR Place Recognition (LPR) compresses dense pointwise features into compact global descriptors. While first-order aggregators such as GeM and NetVLAD are widely used, they overlook inter-feature correlations that second-order aggregation naturally captures. Full covariance, a common second-order aggregator, is high in dimensionality; as a result, practitioners often insert a learned projection or employ random sketches -- both of which either sacrifice information or increase parameter count. However, no prior work has systematically investigated how first- and second-order aggregation perform under constrained feature and compute budgets. In this paper, we first demonstrate that second-order aggregation retains its superiority for LPR even when channels are pruned and backbone parameters are reduced. Building on this insight, we propose Channel Partition-based Second-order Local Feature Aggregation (CPS): a drop-in, partition-based second-order aggregation module that preserves all channels while producing an order-of-magnitude smaller descriptor. CPS matches or exceeds the performance of full covariance and outperforms random projection variants, delivering new state-of-the-art results with only four additional learnable parameters across four large-scale benchmarks: Oxford RobotCar, In-house, MulRan, and WildPlaces.
Paper Structure (22 sections, 1 theorem, 7 equations, 6 figures, 5 tables)

This paper contains 22 sections, 1 theorem, 7 equations, 6 figures, 5 tables.

Key Result

Lemma 2.1

Covariance pooling is permutation invariant: it returns the same mean and covariance no matter how the input descriptors are ordered.

Figures (6)

  • Figure 1: R@1 comparison of proposed CPS with common feature aggregation methods with Minkloc3D komorowski2021minkloc3d on WildPlaces (Venman Env.) dataset. CPS performs higher than others with less memory.
  • Figure 2: Overview of CPS. Given descriptors $\mathbf{X} \in \mathbb{R}^{d\times N}$ from a 3D backbone $\phi(\cdot)$, we partition them into $k$ disjoint groups, compute a normalized second-order, i.e., covariance, matrix per group, and aggregate the upper-triangular entries to form the lower dimensional global representation $\mathbf{z}$.
  • Figure 3: Performance of GeM and iSQRT-COV when local feature channels $d$ are reduced with $1\times 1$ convolution to following dimensions: i.e., $d=\{16, 32, 64, 128, 256\}$.
  • Figure 4: Compact visualization of CPS trading off descriptor size and memory (bubble area) across partition sizes $k$ (marked inside bubbles) to maintain high Oxford R@1, even at $k=16$ (136-d, 1.2 MB), achieving 93.4% recall with over 60x dimensionality reduction and $>$10x lower memory vs. $k=2$.
  • Figure 5: Performance of GeM, iSQRT-COV, Kernel Matrix when local feature channels $d$ are reduced with $1\times 1$ convolution to following dimensions: i.e., $d=\{16, 32, 64, 128, 256\}$.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Remark 3.1: Second-order Statistics Coverage of CPS
  • Lemma 2.1: Symmetry of covariance pooling
  • proof