Table of Contents
Fetching ...

SelFLoc: Selective Feature Fusion for Large-scale Point Cloud-based Place Recognition

Qibo Qiu, Wenxiao Wang, Haochao Ying, Dingkun Liang, Haiming Gao, Xiaofei He

TL;DR

SelFLoc tackles GPS-denied LiDAR-based place recognition by decomposing 3D convolutions into axis-focused 1D operations (SACB) and by selectively reweighting multi-scale features with point- and channel-wise gating (SFFB). The encoder–decoder architecture, sparse 3D convolutions, GeM pooling, and a Smooth-AP–based objective collectively yield strong global descriptors and robust matching across large-scale urban scenes. Empirical results on Oxford and three in-house datasets show state-of-the-art performance and good generalization, with notable AR@1 gains over prior methods. The work highlights the practical value of axis-oriented feature extraction and semantic-alignment-driven fusion for reliable, scalable LiDAR-based place recognition in autonomous systems.

Abstract

Point cloud-based place recognition is crucial for mobile robots and autonomous vehicles, especially when the global positioning sensor is not accessible. LiDAR points are scattered on the surface of objects and buildings, which have strong shape priors along different axes. To enhance message passing along particular axes, Stacked Asymmetric Convolution Block (SACB) is designed, which is one of the main contributions in this paper. Comprehensive experiments demonstrate that asymmetric convolution and its corresponding strategies employed by SACB can contribute to the more effective representation of point cloud feature. On this basis, Selective Feature Fusion Block (SFFB), which is formed by stacking point- and channel-wise gating layers in a predefined sequence, is proposed to selectively boost salient local features in certain key regions, as well as to align the features before fusion phase. SACBs and SFFBs are combined to construct a robust and accurate architecture for point cloud-based place recognition, which is termed SelFLoc. Comparative experimental results show that SelFLoc achieves the state-of-the-art (SOTA) performance on the Oxford and other three in-house benchmarks with an improvement of 1.6 absolute percentages on mean average recall@1.

SelFLoc: Selective Feature Fusion for Large-scale Point Cloud-based Place Recognition

TL;DR

SelFLoc tackles GPS-denied LiDAR-based place recognition by decomposing 3D convolutions into axis-focused 1D operations (SACB) and by selectively reweighting multi-scale features with point- and channel-wise gating (SFFB). The encoder–decoder architecture, sparse 3D convolutions, GeM pooling, and a Smooth-AP–based objective collectively yield strong global descriptors and robust matching across large-scale urban scenes. Empirical results on Oxford and three in-house datasets show state-of-the-art performance and good generalization, with notable AR@1 gains over prior methods. The work highlights the practical value of axis-oriented feature extraction and semantic-alignment-driven fusion for reliable, scalable LiDAR-based place recognition in autonomous systems.

Abstract

Point cloud-based place recognition is crucial for mobile robots and autonomous vehicles, especially when the global positioning sensor is not accessible. LiDAR points are scattered on the surface of objects and buildings, which have strong shape priors along different axes. To enhance message passing along particular axes, Stacked Asymmetric Convolution Block (SACB) is designed, which is one of the main contributions in this paper. Comprehensive experiments demonstrate that asymmetric convolution and its corresponding strategies employed by SACB can contribute to the more effective representation of point cloud feature. On this basis, Selective Feature Fusion Block (SFFB), which is formed by stacking point- and channel-wise gating layers in a predefined sequence, is proposed to selectively boost salient local features in certain key regions, as well as to align the features before fusion phase. SACBs and SFFBs are combined to construct a robust and accurate architecture for point cloud-based place recognition, which is termed SelFLoc. Comparative experimental results show that SelFLoc achieves the state-of-the-art (SOTA) performance on the Oxford and other three in-house benchmarks with an improvement of 1.6 absolute percentages on mean average recall@1.
Paper Structure (16 sections, 13 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 16 sections, 13 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Point cloud-based place recognition in large-scale urban environments. The place recognition network extracts global descriptors from point clouds in different locations, which are subsequently stored in a database. Once the vehicle reaches a new location, the closest match (green) can be retrieved if the distance between the queried and recorded global descriptor is the shortest.
  • Figure 2: (a): An SACB is composed of two sub-blocks, each of which is formed by stacking a specified number of asymmetric convolutions in a predefined sequence. (b): Asymmetric convolutions equipped with different strategies, e.g., typical (pink), dilation (orange) and deformation (blue).
  • Figure 3: The architecture of SelFLoc implemented in an encoder-decoder style. In encoder stage, down sampling layers are utilized to reduce the resolutions of feature maps. each of which is followed by an SACB. Low- (horizontal) and high-level (vertical) features are fused (addition) during the decoder stage. Note that an SFFB is placed prior to the local feature fusion phase, which is intended for point- and channel-wise selective fusion refinement.
  • Figure 4: Query (gray) and top $3$ retrieved frames (green: successful, red: failed). Moreover, one of the true (blue) matches is displayed for comparison. SelFLoc successfully finds the closest match even when the perspective changes (row 3).
  • Figure 5: Horizontal axis represents the depth of which SACB equipped with dilation strategy. There are $4$ SACBs employed in our experiments, and $Depth=0$ indicates that no SACB is equipped with dilation strategy. SelFLoc_X, SelFLoc_Y and SelFLoc_Z represents the models with one additional layer along $x$-, $y$- and $z$-axis, respectively, while model without additional asymmetric convolution layer is regarded as the baseline. Note that the dilation strategy is only applied on the additional layer of each sub-block in the SACB.
  • ...and 1 more figures