Benchmarking the Robustness of LiDAR Semantic Segmentation Models

Xu Yan; Chaoda Zheng; Ying Xue; Zhen Li; Shuguang Cui; Dengxin Dai

Benchmarking the Robustness of LiDAR Semantic Segmentation Models

Xu Yan, Chaoda Zheng, Ying Xue, Zhen Li, Shuguang Cui, Dengxin Dai

TL;DR

This work tackles the robustness of LiDAR semantic segmentation under real-world corruptions by introducing SemanticKITTI-C and SemanticPOSS-C, a 16-type, three-group corruption benchmark. It systematically evaluates 11 models across projection-, point-, voxel-, and hybrid-based representations, revealing that input representation and single-representation voxel methods most strongly influence robustness. The authors distill 12 practical observations, showing that cylindrical voxelization and Mix3D augmentation improve cross-condition robustness, while hybrid representations can hurt resilience. Based on these insights, they propose RLSeg, a robust LiDAR segmentation model that achieves state-of-the-art robustness across the benchmarks and offers a path toward safer, real-world autonomous driving deployments.

Abstract

When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.

Benchmarking the Robustness of LiDAR Semantic Segmentation Models

TL;DR

Abstract

Paper Structure (24 sections, 17 equations, 16 figures, 8 tables)

This paper contains 24 sections, 17 equations, 16 figures, 8 tables.

Introduction
Related Work
LiDAR Semantic Segmentation
Robustness Benchmarks for Images
3D Robustness Benchmarks
Corruptions Taxonomy
Adverse Weather
Measurement Noise
Cross-Device Discrepancy
Candidate Methods
Projection-based Methods
Point-based Methods
Voxel-based Methods
Benchmarking and Analysis
Experiment Setting
...and 9 more sections

Figures (16)

Figure 1: Examples of our proposed SemanticKITTI-C. We corrupt the clean validation set of SemanticKITTI using six types of corruptions with 16 levels of intensity to build upon a comprehensive robustness benchmark for LiDAR semantic segmentation. Listed examples are point clouds on 16-beam LiDAR sensors, with global and local distortion, in snowfall and fog simulations.
Figure 2: Corruption of fog simulation. We demonstrate the raw LiDAR point cloud in the first row. The foggy point clouds with $\beta=0.06$ and $\beta=0.2$ are shown in the last two rows. The point cloud is color coded by the height (z value). The best viewed on a screen and zoomed in.
Figure 3: Corruption of snowfall simulation. We demonstrate the raw LiDAR point cloud in the first row. The snowfall point clouds with snowfall rates 1mm/h and 2.5mm/h are illustrated in the last two rows. The point cloud is color coded by the height (z value). The best viewed on a screen and zoomed in.
Figure 4: Noisy LiDAR point clouds. We demonstrate the raw LiDAR point cloud in the first row. The noisy point clouds with global outliers and local distortion are shown in the last two rows. The point cloud is color coded by the height (z value). The best viewed on a screen and zoomed in.
Figure 5: Cross-device LiDAR point clouds. We demonstrate the 64-beam LiDAR point cloud in the first row. The second and third rows illustrate the 32-beam and 16-beam LiDAR data. The point cloud is color coded by the height (z value). The best viewed on a screen and zoomed in.
...and 11 more figures

Benchmarking the Robustness of LiDAR Semantic Segmentation Models

TL;DR

Abstract

Benchmarking the Robustness of LiDAR Semantic Segmentation Models

Authors

TL;DR

Abstract

Table of Contents

Figures (16)