A Deeper Look into Second-Order Feature Aggregation for LiDAR Place Recognition
Saimunur Rahman, Peyman Moghadam
TL;DR
The paper addresses the inefficiency of first-order pooling in LiDAR place recognition by capturing second-order feature correlations through a scalable approach. It introduces Channel Partition-based Second-order Local Feature Aggregation (CPS), which partitions channels, computes per-group covariances, applies Newton-Schulz normalization, and fuses upper-triangular statistics into a compact descriptor via a learnable weighted sum. CPS achieves state-of-the-art results on four large-scale LPR benchmarks while reducing descriptor dimensionality by 4–16x compared to full covariance, and demonstrates backbone-agnostic robustness when integrated with MinkLoc3D and MinkLoc3Dv2. The findings highlight CPS as a practical, memory-efficient alternative to full covariance that preserves discriminative second-order information, with potential for future integration with transformer backbones and broader LPR datasets.
Abstract
Efficient LiDAR Place Recognition (LPR) compresses dense pointwise features into compact global descriptors. While first-order aggregators such as GeM and NetVLAD are widely used, they overlook inter-feature correlations that second-order aggregation naturally captures. Full covariance, a common second-order aggregator, is high in dimensionality; as a result, practitioners often insert a learned projection or employ random sketches -- both of which either sacrifice information or increase parameter count. However, no prior work has systematically investigated how first- and second-order aggregation perform under constrained feature and compute budgets. In this paper, we first demonstrate that second-order aggregation retains its superiority for LPR even when channels are pruned and backbone parameters are reduced. Building on this insight, we propose Channel Partition-based Second-order Local Feature Aggregation (CPS): a drop-in, partition-based second-order aggregation module that preserves all channels while producing an order-of-magnitude smaller descriptor. CPS matches or exceeds the performance of full covariance and outperforms random projection variants, delivering new state-of-the-art results with only four additional learnable parameters across four large-scale benchmarks: Oxford RobotCar, In-house, MulRan, and WildPlaces.
