Table of Contents
Fetching ...

PointNetPGAP-SLC: A 3D LiDAR-based Place Recognition Approach with Segment-level Consistency Training for Mobile Robots in Horticulture

T. Barros, L. Garrote, P. Conde, M. J. Coombes, C. Liu, C. Premebida, U. J. Nunes

TL;DR

Experimental evaluations conducted on the HORTO-3DLM and KITTI Odometry datasets demonstrate that PointNetPGAP outperforms state-of-the-art models, including OverlapTransformer and PointNetVLAD, particularly when the SLC model is applied.

Abstract

3D LiDAR-based place recognition remains largely underexplored in horticultural environments, which present unique challenges due to their semi-permeable nature to laser beams. This characteristic often results in highly similar LiDAR scans from adjacent rows, leading to descriptor ambiguity and, consequently, compromised retrieval performance. In this work, we address the challenges of 3D LiDAR place recognition in horticultural environments, particularly focusing on inter-row ambiguity by introducing three key contributions: (i) a novel model, PointNetPGAP, which combines the outputs of two statistically-inspired aggregators into a single descriptor; (ii) a Segment-Level Consistency (SLC) model, used exclusively during training to enhance descriptor robustness; and (iii) the HORTO-3DLM dataset, comprising LiDAR sequences from orchards and strawberry fields. Experimental evaluations conducted on the HORTO-3DLM and KITTI Odometry datasets demonstrate that PointNetPGAP outperforms state-of-the-art models, including OverlapTransformer and PointNetVLAD, particularly when the SLC model is applied. These results underscore the model's superiority, especially in horticultural environments, by significantly improving retrieval performance in segments with higher ambiguity.

PointNetPGAP-SLC: A 3D LiDAR-based Place Recognition Approach with Segment-level Consistency Training for Mobile Robots in Horticulture

TL;DR

Experimental evaluations conducted on the HORTO-3DLM and KITTI Odometry datasets demonstrate that PointNetPGAP outperforms state-of-the-art models, including OverlapTransformer and PointNetVLAD, particularly when the SLC model is applied.

Abstract

3D LiDAR-based place recognition remains largely underexplored in horticultural environments, which present unique challenges due to their semi-permeable nature to laser beams. This characteristic often results in highly similar LiDAR scans from adjacent rows, leading to descriptor ambiguity and, consequently, compromised retrieval performance. In this work, we address the challenges of 3D LiDAR place recognition in horticultural environments, particularly focusing on inter-row ambiguity by introducing three key contributions: (i) a novel model, PointNetPGAP, which combines the outputs of two statistically-inspired aggregators into a single descriptor; (ii) a Segment-Level Consistency (SLC) model, used exclusively during training to enhance descriptor robustness; and (iii) the HORTO-3DLM dataset, comprising LiDAR sequences from orchards and strawberry fields. Experimental evaluations conducted on the HORTO-3DLM and KITTI Odometry datasets demonstrate that PointNetPGAP outperforms state-of-the-art models, including OverlapTransformer and PointNetVLAD, particularly when the SLC model is applied. These results underscore the model's superiority, especially in horticultural environments, by significantly improving retrieval performance in segments with higher ambiguity.
Paper Structure (27 sections, 6 equations, 6 figures, 7 tables)

This paper contains 27 sections, 6 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Geolocation, 3D maps and recording setup of the HORTO-3DLM dataset. Four data sequences were recorded over two years at different locations in the UK, using a Husky mobile platform equipped with a 32-beam Velodyne sensor and ZED-F9P RTK GPS/GNSS system.
  • Figure 2: Top: The retrieval framework uses PointNetPGAP to generate global descriptors through the following stages: T1) PointNet processes an input scan $P$ to extract local features $F$; T2) These features are aggregated into a global descriptor $D$ by combining global average representation with pairwise feature interactions; T3) The global descriptor is then used to query the database and retrieve the top-$k$ most similar places. Bottom: The training scheme involves: B1) Inputting a training tuple with an anchor-positive pair and $m$ negatives to generate descriptors; B2) Feeding these descriptors into an MLP in the SLC model to predict segment classes, the SLC loss computed via Negative Log-likelihood using the predictions and the segment labels; B3) Calculating the LazyTriplet loss based on anchor-positive and anchor-negative distances; B4) Combining both losses with a weight $\alpha$.
  • Figure 3: 2D and 3D visualizations of the sequences' paths. The 2D representation shows the individual segments, while the 3D representation outlines the overlapping paths.
  • Figure 4: Retrieval performance for the top-25 candidates on the four sequences.
  • Figure 5: Recall@1 performance at the segment-level for the four sequences. For more details on the segments, please see Fig. \ref{['fig:rows']}.
  • ...and 1 more figures