Table of Contents
Fetching ...

Exploiting Local Features and Range Images for Small Data Real-Time Point Cloud Semantic Segmentation

Daniel Fusaro, Simone Mosco, Emanuele Menegatti, Alberto Pretto

TL;DR

A reduced version of this model not only demonstrates strong competitiveness against full-scale state-of-the-art models but also operates in real-time, making it a viable choice for real-world case applications.

Abstract

Semantic segmentation of point clouds is an essential task for understanding the environment in autonomous driving and robotics. Recent range-based works achieve real-time efficiency, while point- and voxel-based methods produce better results but are affected by high computational complexity. Moreover, highly complex deep learning models are often not suited to efficiently learn from small datasets. Their generalization capabilities can easily be driven by the abundance of data rather than the architecture design. In this paper, we harness the information from the three-dimensional representation to proficiently capture local features, while introducing the range image representation to incorporate additional information and facilitate fast computation. A GPU-based KDTree allows for rapid building, querying, and enhancing projection with straightforward operations. Extensive experiments on SemanticKITTI and nuScenes datasets demonstrate the benefits of our modification in a ``small data'' setup, in which only one sequence of the dataset is used to train the models, but also in the conventional setup, where all sequences except one are used for training. We show that a reduced version of our model not only demonstrates strong competitiveness against full-scale state-of-the-art models but also operates in real-time, making it a viable choice for real-world case applications. The code of our method is available at https://github.com/Bender97/WaffleAndRange.

Exploiting Local Features and Range Images for Small Data Real-Time Point Cloud Semantic Segmentation

TL;DR

A reduced version of this model not only demonstrates strong competitiveness against full-scale state-of-the-art models but also operates in real-time, making it a viable choice for real-world case applications.

Abstract

Semantic segmentation of point clouds is an essential task for understanding the environment in autonomous driving and robotics. Recent range-based works achieve real-time efficiency, while point- and voxel-based methods produce better results but are affected by high computational complexity. Moreover, highly complex deep learning models are often not suited to efficiently learn from small datasets. Their generalization capabilities can easily be driven by the abundance of data rather than the architecture design. In this paper, we harness the information from the three-dimensional representation to proficiently capture local features, while introducing the range image representation to incorporate additional information and facilitate fast computation. A GPU-based KDTree allows for rapid building, querying, and enhancing projection with straightforward operations. Extensive experiments on SemanticKITTI and nuScenes datasets demonstrate the benefits of our modification in a ``small data'' setup, in which only one sequence of the dataset is used to train the models, but also in the conventional setup, where all sequences except one are used for training. We show that a reduced version of our model not only demonstrates strong competitiveness against full-scale state-of-the-art models but also operates in real-time, making it a viable choice for real-world case applications. The code of our method is available at https://github.com/Bender97/WaffleAndRange.

Paper Structure

This paper contains 19 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: The four 2D projections utilized by our system for semantically segmenting the 3D point cloud are as follows: XY, XZ, YZ, and range image projection.
  • Figure 2: (Above) An architectural overview of the proposed method featuring Point Cloud Embedding for point features computation, Point Cloud processing layers as the backbone with integrated Spatial and channel mix modules, and a Segmentation Head for generating final predictions. (Below) A detailed representation of the backbone, showcasing the spatial mix and channel mix modules. The spatial mix includes batch normalization, projection onto a 2D plane, 2D depth-wise convolution, re-projection onto 3D points, 1D depth-wise convolution, and a residual connection. Meanwhile, the channel mix employs batch normalization, 1D convolution, 1D depth-wise convolution, and a residual connection.