Table of Contents
Fetching ...

An Experimental Study of SOTA LiDAR Segmentation Models

Bike Chen, Antti Tikanmäki, Juha Röning

TL;DR

This study addresses the need for apples-to-apples evaluation of state-of-the-art point-, voxel-, and range-image LiDAR PCS approaches in real-world settings by re-training and fairly evaluating five architectures under a unified augmentation regime with motion compensation. It conducts comprehensive benchmarks on SemanticKITTI and nuScenes, reporting model size, memory, latency, FPS, IoU, and mIoU to guide deployment in robotics and autonomous driving. Key findings reveal distinct speed-accuracy trade-offs: range-image methods deliver real-time performance with competitive accuracy, voxel-based backbones achieve strong mIoU at higher compute, and point-based approaches struggle with real-time constraints but excel on irregular shapes; deskewing and augmentation choices significantly influence results. The work provides practical guidelines for selecting and designing PCS systems and offers a reproducible benchmark to drive future research and real-world integration with SLAM for semantic mapping.

Abstract

Point cloud segmentation (PCS) is to classify each point in point clouds. The task enables robots to parse their 3D surroundings and run autonomously. According to different point cloud representations, existing PCS models can be roughly divided into point-, voxel-, and range image-based models. However, no work has been found to report comprehensive comparisons among the state-of-the-art point-, voxel-, and range image-based models from an application perspective, bringing difficulty in utilizing these models for real-world scenarios. In this paper, we provide thorough comparisons among the models by considering the LiDAR data motion compensation and the metrics of model parameters, max GPU memory allocated during testing, inference latency, frames per second, intersection-over-union (IoU) and mean IoU (mIoU) scores. The experimental results benefit engineers when choosing a reasonable PCS model for an application and inspire researchers in the PCS field to design more practical models for a real-world scenario.

An Experimental Study of SOTA LiDAR Segmentation Models

TL;DR

This study addresses the need for apples-to-apples evaluation of state-of-the-art point-, voxel-, and range-image LiDAR PCS approaches in real-world settings by re-training and fairly evaluating five architectures under a unified augmentation regime with motion compensation. It conducts comprehensive benchmarks on SemanticKITTI and nuScenes, reporting model size, memory, latency, FPS, IoU, and mIoU to guide deployment in robotics and autonomous driving. Key findings reveal distinct speed-accuracy trade-offs: range-image methods deliver real-time performance with competitive accuracy, voxel-based backbones achieve strong mIoU at higher compute, and point-based approaches struggle with real-time constraints but excel on irregular shapes; deskewing and augmentation choices significantly influence results. The work provides practical guidelines for selecting and designing PCS systems and offers a reproducible benchmark to drive future research and real-world integration with SLAM for semantic mapping.

Abstract

Point cloud segmentation (PCS) is to classify each point in point clouds. The task enables robots to parse their 3D surroundings and run autonomously. According to different point cloud representations, existing PCS models can be roughly divided into point-, voxel-, and range image-based models. However, no work has been found to report comprehensive comparisons among the state-of-the-art point-, voxel-, and range image-based models from an application perspective, bringing difficulty in utilizing these models for real-world scenarios. In this paper, we provide thorough comparisons among the models by considering the LiDAR data motion compensation and the metrics of model parameters, max GPU memory allocated during testing, inference latency, frames per second, intersection-over-union (IoU) and mean IoU (mIoU) scores. The experimental results benefit engineers when choosing a reasonable PCS model for an application and inspire researchers in the PCS field to design more practical models for a real-world scenario.

Paper Structure

This paper contains 20 sections, 1 equation, 7 figures, 9 tables.

Figures (7)

  • Figure 1: (a) is an ideal scan (i.e., only show the top view). When the LiDAR sensor does not move, the collected raw points form a circle. (b) is a skewing scan. When the LiDAR sensor moves from A to B quickly, the points are distorted. (c) is a deskewing scan. All points are aligned at the end of the scan with motion compensation. (e), (f) and (g) are voxels on the above (a), (b), and (c), respectively (i.e., only show the top view). (d) is the range image on the skewing scan. (h) is the range image on the deskewing scan. There are some black holes around the objects in (h) (i.e., see the white masks).
  • Figure 2: The pipeline of the point cloud segmentation (PCS) models.
  • Figure 3: The data augmentation combination used in the paper. The red fonts indicate the probabilities during training.
  • Figure 4: Comparison results among point-, voxel-, and range image-based models waffleiron23minkowski2019spvnas_2020pdm2024 on the SemanticKITTI semantickitti_2019_behley validation dataset in terms of the model parameters (M), frames per second (FPS), mIoU scores (%). The sizes of the circles indicate the number of model parameters.
  • Figure 5: Comparison results among point-, voxel-, and range image-based models waffleiron23minkowski2019spvnas_2020pdm2024 on the nuScenes nuscenes_panoptic validation dataset in terms of the model parameters (M), frames per second (FPS), mIoU scores (%). The sizes of the circles indicate the number of model parameters.
  • ...and 2 more figures