Table of Contents
Fetching ...

SPVSoAP3D: A Second-order Average Pooling Approach to enhance 3D Place Recognition in Horticultural Environments

T. Barros, C. Premebida, S. Aravecchia, C. Pradalier, U. J. Nunes

TL;DR

This work introduces SPVSoAP3D, a novel modeling approach that combines a voxel-based feature extraction network with an aggregation technique based on a second-order average pooling operator, complemented by a descriptor enhancement stage, and augments the existing HORTO-3DLM dataset.

Abstract

3D LiDAR-based place recognition has been extensively researched in urban environments, yet it remains underexplored in agricultural settings. Unlike urban contexts, horticultural environments, characterized by their permeability to laser beams, result in sparse and overlapping LiDAR scans with suboptimal geometries. This phenomenon leads to intra- and inter-row descriptor ambiguity. In this work, we address this challenge by introducing SPVSoAP3D, a novel modeling approach that combines a voxel-based feature extraction network with an aggregation technique based on a second-order average pooling operator, complemented by a descriptor enhancement stage. Furthermore, we augment the existing HORTO-3DLM dataset by introducing two new sequences derived from horticultural environments. We evaluate the performance of SPVSoAP3D against state-of-the-art (SOTA) models, including OverlapTransformer, PointNetVLAD, and LOGG3D-Net, utilizing a cross-validation protocol on both the newly introduced sequences and the existing HORTO-3DLM dataset. The findings indicate that the average operator is more suitable for horticultural environments compared to the max operator and other first-order pooling techniques. Additionally, the results highlight the improvements brought by the descriptor enhancement stage.

SPVSoAP3D: A Second-order Average Pooling Approach to enhance 3D Place Recognition in Horticultural Environments

TL;DR

This work introduces SPVSoAP3D, a novel modeling approach that combines a voxel-based feature extraction network with an aggregation technique based on a second-order average pooling operator, complemented by a descriptor enhancement stage, and augments the existing HORTO-3DLM dataset.

Abstract

3D LiDAR-based place recognition has been extensively researched in urban environments, yet it remains underexplored in agricultural settings. Unlike urban contexts, horticultural environments, characterized by their permeability to laser beams, result in sparse and overlapping LiDAR scans with suboptimal geometries. This phenomenon leads to intra- and inter-row descriptor ambiguity. In this work, we address this challenge by introducing SPVSoAP3D, a novel modeling approach that combines a voxel-based feature extraction network with an aggregation technique based on a second-order average pooling operator, complemented by a descriptor enhancement stage. Furthermore, we augment the existing HORTO-3DLM dataset by introducing two new sequences derived from horticultural environments. We evaluate the performance of SPVSoAP3D against state-of-the-art (SOTA) models, including OverlapTransformer, PointNetVLAD, and LOGG3D-Net, utilizing a cross-validation protocol on both the newly introduced sequences and the existing HORTO-3DLM dataset. The findings indicate that the average operator is more suitable for horticultural environments compared to the max operator and other first-order pooling techniques. Additionally, the results highlight the improvements brought by the descriptor enhancement stage.

Paper Structure

This paper contains 21 sections, 4 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: 3D maps of the two new sequences from horticultural environments proposed in this work are as follows: (a) GTJ23, recorded in a greenhouse tomato production facility in Coimbra, Portugal, and (b) ON23, recorded in an orchard in Metz, France. Additionally, (c) illustrates a 3D map of an HORTO-3DLM sequence (SJ23) with two scans, showcasing the permeable nature of these environments to LiDAR scans.
  • Figure 2: SPVSoAP3D in a retrieval-based framework. SPVSoAP3D has 5 main stages. In stage 1, the input 3D LiDAR scan $P$ is voxelized and fed to the backbone SPVCNN, which returns local features $F$. In stage 2, these local features are aggregated using second-order average pooling, resulting in $\bar{F}$. In stage 3, the aggregated features $\bar{F}$ are projected to a tangent space using Log-Euclidean projection, yielding $\bar{F}^e$. In stage 4, the $\bar{F}^e$ features are rescaled using power normalization. In stage 5, the rescaled features $\bar{F}^r$ are flattened and fed to a fully connected layer followed by $L2$-norm. Finally, the model outputs a descriptor $D$, which is used to query the database for the top-k most similar descriptors.
  • Figure 3: Mobile platforms and sensors used for recording the sequences. The Jackal platform was used to record the GTJ23, while the Husky platform was used to collect data for the ON23 dataset.
  • Figure 4: Paths with segments identified by colored S1,...,S7. GTJ3, SJ23, OJ23 and OJ22 have 5 segments, ON22 has 6 segments, while ON23 has 7 segments.
  • Figure 5: Performance results along the segments, reported using the Recall@$k$ with $k \in [1,10]$ for $r_{th} \in [1,..,l]$ where $l$ is the segment length.