Table of Contents
Fetching ...

An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models

Jiahao Sun, Chunmei Qing, Xiang Xu, Lingdong Kong, Youquan Liu, Li Li, Chenming Zhu, Jingwei Zhang, Zeqi Xiao, Runnan Chen, Tai Wang, Wenwei Zhang, Kai Chen

TL;DR

This paper introduces MMDetection3D-lidarseg, a unified toolbox to train and benchmark state-of-the-art LiDAR semantic segmentation models across multiple backends and data augmentations, addressing fragmentation in existing codebases. It presents a comprehensive dataset and model support, including five sparse-convolution backends and advanced 3D augmentation techniques, validated on SemanticKITTI, nuScenes, and ScribbleKITTI with extensive fully-, semi-, and weakly-supervised experiments. Key findings show that data augmentation (LaserMix, PolarMix, FrustumMix) and test-time augmentation substantially boost accuracy, while backends and AMP influence training and inference speed; larger MinkUNet variants improve accuracy at higher compute cost, while range-view methods offer faster inference. The toolbox aims to standardize evaluation, accelerate experimentation, and promote open research by releasing code and trained models publicly, with future plans to broaden model coverage and domain applications.

Abstract

In the rapidly evolving field of autonomous driving, precise segmentation of LiDAR data is crucial for understanding complex 3D environments. Traditional approaches often rely on disparate, standalone codebases, hindering unified advancements and fair benchmarking across models. To address these challenges, we introduce MMDetection3D-lidarseg, a comprehensive toolbox designed for the efficient training and evaluation of state-of-the-art LiDAR segmentation models. We support a wide range of segmentation models and integrate advanced data augmentation techniques to enhance robustness and generalization. Additionally, the toolbox provides support for multiple leading sparse convolution backends, optimizing computational efficiency and performance. By fostering a unified framework, MMDetection3D-lidarseg streamlines development and benchmarking, setting new standards for research and application. Our extensive benchmark experiments on widely-used datasets demonstrate the effectiveness of the toolbox. The codebase and trained models have been publicly available, promoting further research and innovation in the field of LiDAR segmentation for autonomous driving.

An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models

TL;DR

This paper introduces MMDetection3D-lidarseg, a unified toolbox to train and benchmark state-of-the-art LiDAR semantic segmentation models across multiple backends and data augmentations, addressing fragmentation in existing codebases. It presents a comprehensive dataset and model support, including five sparse-convolution backends and advanced 3D augmentation techniques, validated on SemanticKITTI, nuScenes, and ScribbleKITTI with extensive fully-, semi-, and weakly-supervised experiments. Key findings show that data augmentation (LaserMix, PolarMix, FrustumMix) and test-time augmentation substantially boost accuracy, while backends and AMP influence training and inference speed; larger MinkUNet variants improve accuracy at higher compute cost, while range-view methods offer faster inference. The toolbox aims to standardize evaluation, accelerate experimentation, and promote open research by releasing code and trained models publicly, with future plans to broaden model coverage and domain applications.

Abstract

In the rapidly evolving field of autonomous driving, precise segmentation of LiDAR data is crucial for understanding complex 3D environments. Traditional approaches often rely on disparate, standalone codebases, hindering unified advancements and fair benchmarking across models. To address these challenges, we introduce MMDetection3D-lidarseg, a comprehensive toolbox designed for the efficient training and evaluation of state-of-the-art LiDAR segmentation models. We support a wide range of segmentation models and integrate advanced data augmentation techniques to enhance robustness and generalization. Additionally, the toolbox provides support for multiple leading sparse convolution backends, optimizing computational efficiency and performance. By fostering a unified framework, MMDetection3D-lidarseg streamlines development and benchmarking, setting new standards for research and application. Our extensive benchmark experiments on widely-used datasets demonstrate the effectiveness of the toolbox. The codebase and trained models have been publicly available, promoting further research and innovation in the field of LiDAR segmentation for autonomous driving.
Paper Structure (17 sections, 2 equations, 4 figures, 7 tables)

This paper contains 17 sections, 2 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Performance comparisons of state-of-the-art LiDAR segmentation models choy2019minkowskizhu2021cylindrical from different codebases on the validation sets of the SemanticKITTI behley2019semanticKITTI and nuScenes fong2022panoptic-nuScenes datasets.
  • Figure 2: Overview of voxel-based and projection-based LiDAR segmentors illustrated with abstractions in the MMDetection3D-lidarseg codebase. Modules marked with $*$ are optional.
  • Figure 3: Performance comparisons of state-of-the-art LiDAR segmentation models choy2019minkowskizhu2021cylindricaltang2020searchingxu2023frnetcheng2022cenet from different LiDAR representation groups (voxel, bird's eye view, range view, fusion) on the validation set of SemanticKITTI behley2019semanticKITTI. We report the segmentation accuracy (mIoU), inference latency (FPS), and model parameters. The larger the area coverage, the larger the model capacity.
  • Figure 4: Performance comparisons of different sparse convolution backends choy2019minkowskispconv2022yan2018secondtang2022torchsparsetang2023torchsparse++ on the validation sets of SemanticKITTI behley2019semanticKITTI and nuScenes fong2022panoptic-nuScenes datasets. All experiments are conducted using the MinkUNet-34-w32 backbone choy2019minkowski. Both the training speed (Iter/s, iterations per second) and inference speed (FPS, frames per second) are measured using a single NVIDIA A100 GPU.