MinkUNeXt: Point Cloud-based Large-scale Place Recognition using 3D Sparse Convolutions

J. J. Cabrera; A. Santo; A. Gil; C. Viegas; L. Payá

MinkUNeXt: Point Cloud-based Large-scale Place Recognition using 3D Sparse Convolutions

J. J. Cabrera, A. Santo, A. Gil, C. Viegas, L. Payá

Abstract

This paper presents MinkUNeXt, an effective and efficient architecture for place-recognition from point clouds entirely based on the new 3D MinkNeXt Block, a residual block composed of 3D sparse convolutions that follows the philosophy established by recent Transformers but purely using simple 3D convolutions. Feature extraction is performed at different scales by a U-Net encoder-decoder network and the feature aggregation of those features into a single descriptor is carried out by a Generalized Mean Pooling (GeM). The proposed architecture demonstrates that it is possible to surpass the current state-of-the-art by only relying on conventional 3D sparse convolutions without making use of more complex and sophisticated proposals such as Transformers, Attention-Layers or Deformable Convolutions. A thorough assessment of the proposal has been carried out using the Oxford RobotCar and the In-house datasets. As a result, MinkUNeXt proves to outperform other methods in the state-of-the-art.

MinkUNeXt: Point Cloud-based Large-scale Place Recognition using 3D Sparse Convolutions

Abstract

Paper Structure (17 sections, 2 equations, 4 figures, 5 tables)

This paper contains 17 sections, 2 equations, 4 figures, 5 tables.

Introduction
State of the art
MinkUNeXt: global point cloud descriptor for place recognition
Global Architecture
Residual Block Architecture
Experiments
Datasets
Labelling and similarity
Training and evaluation
Implementation details
Ablation study: From MinkUNet to MinkUNeXt
Global Design
Residual Block Design
Comparison with the state of the art
Results with the Baseline Protocol
...and 2 more sections

Figures (4)

Figure 1: Point cloud-based place recognition. Each query point cloud (red) is embedded into a global descriptor which is compared with the descriptors from the database point clouds (blue) by means of a Nearest Neighbour Search.
Figure 2: This diagram shows the architecture of the proposed MinkUNeXt, which is based on a semantic segmentation network (U-Net) modified and enhanced to perform place-recognition from point clouds.
Figure 3: This diagram shows the proposed MinkNeXt Block. This residual block is an essential part of the global network, since increases the number of feature maps through an inverted bottleneck.
Figure 4: This diagram illustrates the design progress of the proposed architecture from MinkUNet up to MinkUNeXt. All the proposed modifications are summarized in Table \ref{['tab:comentarios_nodo']}.

MinkUNeXt: Point Cloud-based Large-scale Place Recognition using 3D Sparse Convolutions

Abstract

MinkUNeXt: Point Cloud-based Large-scale Place Recognition using 3D Sparse Convolutions

Authors

Abstract

Table of Contents

Figures (4)