Table of Contents
Fetching ...

MinkUNeXt: Point Cloud-based Large-scale Place Recognition using 3D Sparse Convolutions

J. J. Cabrera, A. Santo, A. Gil, C. Viegas, L. Payá

Abstract

This paper presents MinkUNeXt, an effective and efficient architecture for place-recognition from point clouds entirely based on the new 3D MinkNeXt Block, a residual block composed of 3D sparse convolutions that follows the philosophy established by recent Transformers but purely using simple 3D convolutions. Feature extraction is performed at different scales by a U-Net encoder-decoder network and the feature aggregation of those features into a single descriptor is carried out by a Generalized Mean Pooling (GeM). The proposed architecture demonstrates that it is possible to surpass the current state-of-the-art by only relying on conventional 3D sparse convolutions without making use of more complex and sophisticated proposals such as Transformers, Attention-Layers or Deformable Convolutions. A thorough assessment of the proposal has been carried out using the Oxford RobotCar and the In-house datasets. As a result, MinkUNeXt proves to outperform other methods in the state-of-the-art.

MinkUNeXt: Point Cloud-based Large-scale Place Recognition using 3D Sparse Convolutions

Abstract

This paper presents MinkUNeXt, an effective and efficient architecture for place-recognition from point clouds entirely based on the new 3D MinkNeXt Block, a residual block composed of 3D sparse convolutions that follows the philosophy established by recent Transformers but purely using simple 3D convolutions. Feature extraction is performed at different scales by a U-Net encoder-decoder network and the feature aggregation of those features into a single descriptor is carried out by a Generalized Mean Pooling (GeM). The proposed architecture demonstrates that it is possible to surpass the current state-of-the-art by only relying on conventional 3D sparse convolutions without making use of more complex and sophisticated proposals such as Transformers, Attention-Layers or Deformable Convolutions. A thorough assessment of the proposal has been carried out using the Oxford RobotCar and the In-house datasets. As a result, MinkUNeXt proves to outperform other methods in the state-of-the-art.
Paper Structure (17 sections, 2 equations, 4 figures, 5 tables)

This paper contains 17 sections, 2 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Point cloud-based place recognition. Each query point cloud (red) is embedded into a global descriptor which is compared with the descriptors from the database point clouds (blue) by means of a Nearest Neighbour Search.
  • Figure 2: This diagram shows the architecture of the proposed MinkUNeXt, which is based on a semantic segmentation network (U-Net) modified and enhanced to perform place-recognition from point clouds.
  • Figure 3: This diagram shows the proposed MinkNeXt Block. This residual block is an essential part of the global network, since increases the number of feature maps through an inverted bottleneck.
  • Figure 4: This diagram illustrates the design progress of the proposed architecture from MinkUNet up to MinkUNeXt. All the proposed modifications are summarized in Table \ref{['tab:comentarios_nodo']}.