Table of Contents
Fetching ...

FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation

Bin Yang, Alexandru Paul Condurache

TL;DR

FLARES addresses the inefficiency and information loss of single-range range-view LiDAR segmentation by partitioning the point cloud into multiple spherical sub-clouds and training on low-resolution range images. It introduces two specialized data augmentations and a novel NNRI post-processing to mitigate class imbalance, projection artifacts, and the many-to-one problem, achieving consistent accuracy gains across architectures and substantial inference speed-ups on SemanticKITTI and nuScenes. The approach is validated through extensive experiments and ablations, demonstrating both generalizability and practical efficiency gains. The work advances real-time, accurate LiDAR semantic segmentation with a flexible, multi-range training paradigm.

Abstract

3D scene understanding is a critical yet challenging task in autonomous driving due to the irregularity and sparsity of LiDAR data, as well as the computational demands of processing large-scale point clouds. Recent methods leverage range-view representations to enhance efficiency, but they often adopt higher azimuth resolutions to mitigate information loss during spherical projection, where only the closest point is retained for each 2D grid. However, processing wide panoramic range-view images remains inefficient and may introduce additional distortions. Our empirical analysis shows that training with multiple range images, obtained from splitting the full point cloud, improves both segmentation accuracy and computational efficiency. However, this approach also poses new challenges of exacerbated class imbalance and increase in projection artifacts. To address these, we introduce FLARES, a novel training paradigm that incorporates two tailored data augmentation techniques and a specialized post-processing method designed for multi-range settings. Extensive experiments demonstrate that FLARES is highly generalizable across different architectures, yielding 2.1%~7.9% mIoU improvements on SemanticKITTI and 1.8%~3.9% mIoU on nuScenes, while delivering over 40% speed-up in inference.

FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation

TL;DR

FLARES addresses the inefficiency and information loss of single-range range-view LiDAR segmentation by partitioning the point cloud into multiple spherical sub-clouds and training on low-resolution range images. It introduces two specialized data augmentations and a novel NNRI post-processing to mitigate class imbalance, projection artifacts, and the many-to-one problem, achieving consistent accuracy gains across architectures and substantial inference speed-ups on SemanticKITTI and nuScenes. The approach is validated through extensive experiments and ablations, demonstrating both generalizability and practical efficiency gains. The work advances real-time, accurate LiDAR semantic segmentation with a flexible, multi-range training paradigm.

Abstract

3D scene understanding is a critical yet challenging task in autonomous driving due to the irregularity and sparsity of LiDAR data, as well as the computational demands of processing large-scale point clouds. Recent methods leverage range-view representations to enhance efficiency, but they often adopt higher azimuth resolutions to mitigate information loss during spherical projection, where only the closest point is retained for each 2D grid. However, processing wide panoramic range-view images remains inefficient and may introduce additional distortions. Our empirical analysis shows that training with multiple range images, obtained from splitting the full point cloud, improves both segmentation accuracy and computational efficiency. However, this approach also poses new challenges of exacerbated class imbalance and increase in projection artifacts. To address these, we introduce FLARES, a novel training paradigm that incorporates two tailored data augmentation techniques and a specialized post-processing method designed for multi-range settings. Extensive experiments demonstrate that FLARES is highly generalizable across different architectures, yielding 2.1%~7.9% mIoU improvements on SemanticKITTI and 1.8%~3.9% mIoU on nuScenes, while delivering over 40% speed-up in inference.

Paper Structure

This paper contains 27 sections, 3 equations, 17 figures, 12 tables, 2 algorithms.

Figures (17)

  • Figure 1: Visual comparison among different training procedures for range-view LiDAR semantic segmentation: ① Splitting, ② Range-view projection, ③ Network prediction, ④ Post-processing, ⑤ Image concatenation.
  • Figure 2: Statistics on SemanticKITTI behley2019semantickitti: 3D validity (proportion of projected points) with different azimuth (W) and elevation (H) resolutions. Comparable increases are observable when doubling azimuth and elevation resolution ($\Delta V_{azi}, \Delta V_{ele}$).
  • Figure 3: An example from SemanticKITTI behley2019semantickitti dataset is visualized in top-down view and range-view. Three crops are magnified to specify three limitations brought by the splitting of point clouds. 1) Exacerbated class imbalance: objects with small sizes, likewise low occurrences in annotations, tend to fade away in the range image after downsampling (e.g. pole in the left crop). 2) Intensified noise: reduction in points results in increasing clutters (middle crop) that may disrupt the training stability. 3) Deteriorated distortion: decrease in point density can introduce more projection artifacts, thus corrupting the sharpness of local geometry (e.g. blurry boundaries of car in the right crop).
  • Figure 4: Qualitative results on SemanticKITTIbehley2019semantickitti Points in red and gray represent incorrect and correct predictions, respectively. $^\star$More examples are provided in the supplementary material.
  • Figure 5: a) Ablation on the number of sampled frames for WPD+. b) Comparative plot of IoU scores for top rare classes with and without the synthetic dataset, alongside the class frequency distribution in the validation set. For all inferences, KNN 2019rangenet++ is used as the post-processing approach.
  • ...and 12 more figures