Table of Contents
Fetching ...

ForestLPR: LiDAR Place Recognition in Forests Attentioning Multiple BEV Density Images

Yanqing Shen, Turcan Tuna, Marco Hutter, Cesar Cadena, Nanning Zheng

TL;DR

ForestLPR addresses LiDAR-based place recognition in forest environments, where low salient features and high self-similarity hinder localization. It introduces a pipeline that uses cross-sectional BEV density images from horizontal slices at multiple heights, processed by a visual transformer backbone with a multi-BEV interaction module to yield a rotation-invariant global descriptor. A two-stage training regime, including overlap-based positive mining with $o>0.9$, achieves strong improvements over SOTA on diverse forest datasets, with a compact $D=1024$-dimensional descriptor and real-time-friendly latency. The approach demonstrates robust generalization across forest conditions and scan patterns, indicating practical applicability for onboard robotic localization and loop-closure tasks in natural environments.

Abstract

Place recognition is essential to maintain global consistency in large-scale localization systems. While research in urban environments has progressed significantly using LiDARs or cameras, applications in natural forest-like environments remain largely under-explored. Furthermore, forests present particular challenges due to high self-similarity and substantial variations in vegetation growth over time. In this work, we propose a robust LiDAR-based place recognition method for natural forests, ForestLPR. We hypothesize that a set of cross-sectional images of the forest's geometry at different heights contains the information needed to recognize revisiting a place. The cross-sectional images are represented by \ac{bev} density images of horizontal slices of the point cloud at different heights. Our approach utilizes a visual transformer as the shared backbone to produce sets of local descriptors and introduces a multi-BEV interaction module to attend to information at different heights adaptively. It is followed by an aggregation layer that produces a rotation-invariant place descriptor. We evaluated the efficacy of our method extensively on real-world data from public benchmarks as well as robotic datasets and compared it against the state-of-the-art (SOTA) methods. The results indicate that ForestLPR has consistently good performance on all evaluations and achieves an average increase of 7.38\% and 9.11\% on Recall@1 over the closest competitor on intra-sequence loop closure detection and inter-sequence re-localization, respectively, validating our hypothesis

ForestLPR: LiDAR Place Recognition in Forests Attentioning Multiple BEV Density Images

TL;DR

ForestLPR addresses LiDAR-based place recognition in forest environments, where low salient features and high self-similarity hinder localization. It introduces a pipeline that uses cross-sectional BEV density images from horizontal slices at multiple heights, processed by a visual transformer backbone with a multi-BEV interaction module to yield a rotation-invariant global descriptor. A two-stage training regime, including overlap-based positive mining with , achieves strong improvements over SOTA on diverse forest datasets, with a compact -dimensional descriptor and real-time-friendly latency. The approach demonstrates robust generalization across forest conditions and scan patterns, indicating practical applicability for onboard robotic localization and loop-closure tasks in natural environments.

Abstract

Place recognition is essential to maintain global consistency in large-scale localization systems. While research in urban environments has progressed significantly using LiDARs or cameras, applications in natural forest-like environments remain largely under-explored. Furthermore, forests present particular challenges due to high self-similarity and substantial variations in vegetation growth over time. In this work, we propose a robust LiDAR-based place recognition method for natural forests, ForestLPR. We hypothesize that a set of cross-sectional images of the forest's geometry at different heights contains the information needed to recognize revisiting a place. The cross-sectional images are represented by \ac{bev} density images of horizontal slices of the point cloud at different heights. Our approach utilizes a visual transformer as the shared backbone to produce sets of local descriptors and introduces a multi-BEV interaction module to attend to information at different heights adaptively. It is followed by an aggregation layer that produces a rotation-invariant place descriptor. We evaluated the efficacy of our method extensively on real-world data from public benchmarks as well as robotic datasets and compared it against the state-of-the-art (SOTA) methods. The results indicate that ForestLPR has consistently good performance on all evaluations and achieves an average increase of 7.38\% and 9.11\% on Recall@1 over the closest competitor on intra-sequence loop closure detection and inter-sequence re-localization, respectively, validating our hypothesis

Paper Structure

This paper contains 19 sections, 11 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: (a) shows a part of the point cloud submap that is colored by projected attention maps (the zoomed-in part is shown in (b)). It can be seen that after removing the ground and tree top, some canopy and bushes are still included in the cropped point cloud. Our ForestLPR framework is able to utilize multiple BEV density images and a multi-BEV interaction module to achieve adaptive attention to different heights at each patch location.
  • Figure 2: Results of ground segmentation and height offset removal. In (a), the ground points are marked in red. The ground segmentation result is used to remove the height offset of non-ground points, as shown in (b). Point clouds are colored by height.
  • Figure 3: An overview of the proposed framework. After pre-processing, multiple BEV density images are generated from a point cloud. Each BEV image can be processed separately by the shared descriptor extraction block, combining features from multiple transformer layers. The local features are then fed into the path-level multi-BEV interaction module to highlight the discriminative features. Global descriptors are obtained through an aggregation layer. The query and database share the same pipeline to extract features during training and testing.
  • Figure 4: Environment and equipment visualization of different datasets, showing diversity and richness. We use red planes to illustrate locations where the height is 1m and yellow planes to indicate locations at a height of 6m. It can be shown that although most of the leaves and bushes are removed, some are still included in the horizontal slices rather than having a perfectly clear appearance, which brings great challenges to methods and motivates us to use multiple BEV density images in this work.
  • Figure 5: Recall@1 performance of methods for intra-sequence evaluations under different distance threshold. When calculating R@1, the denominator is the number of queries with corresponding positive samples, and the numerator is the number of queries that can retrieve positives. As the threshold increases, corresponding positives of one query may emerge, increasing the denominator by one and affecting monotonicity. The submap radius is 30m.
  • ...and 1 more figures