Table of Contents
Fetching ...

SynthmanticLiDAR: A Synthetic Dataset for Semantic Segmentation on LiDAR Imaging

Javier Montalvo, Pablo Carballeira, Álvaro García-Martín

TL;DR

The paper presents SynthmanticLiDAR, a synthetic LiDAR semantic segmentation dataset generated with a modified CARLA simulator designed to closely match SemanticKITTI in class definitions and distribution. By pre-training segmentation models on SynthmanticLiDAR and fine-tuning on SemanticKITTI, the authors demonstrate improved performance for state-of-the-art methods SPVCNN and SqueezeSegV3, illustrating the value of synthetic data for reducing labeling costs and enhancing generalization. The LT subset further reveals a trade-off between underrepresented and well-represented classes, highlighting the need for balanced transfer learning. The dataset and accompanying tools are released publicly to enable further exploration of synthetic-to-real transfer and distribution-aware data generation in LiDAR perception.

Abstract

Semantic segmentation on LiDAR imaging is increasingly gaining attention, as it can provide useful knowledge for perception systems and potential for autonomous driving. However, collecting and labeling real LiDAR data is an expensive and time-consuming task. While datasets such as SemanticKITTI have been manually collected and labeled, the introduction of simulation tools such as CARLA, has enabled the creation of synthetic datasets on demand. In this work, we present a modified CARLA simulator designed with LiDAR semantic segmentation in mind, with new classes, more consistent object labeling with their counterparts from real datasets such as SemanticKITTI, and the possibility to adjust the object class distribution. Using this tool, we have generated SynthmanticLiDAR, a synthetic dataset for semantic segmentation on LiDAR imaging, designed to be similar to SemanticKITTI, and we evaluate its contribution to the training process of different semantic segmentation algorithms by using a naive transfer learning approach. Our results show that incorporating SynthmanticLiDAR into the training process improves the overall performance of tested algorithms, proving the usefulness of our dataset, and therefore, our adapted CARLA simulator. The dataset and simulator are available in https://github.com/vpulab/SynthmanticLiDAR.

SynthmanticLiDAR: A Synthetic Dataset for Semantic Segmentation on LiDAR Imaging

TL;DR

The paper presents SynthmanticLiDAR, a synthetic LiDAR semantic segmentation dataset generated with a modified CARLA simulator designed to closely match SemanticKITTI in class definitions and distribution. By pre-training segmentation models on SynthmanticLiDAR and fine-tuning on SemanticKITTI, the authors demonstrate improved performance for state-of-the-art methods SPVCNN and SqueezeSegV3, illustrating the value of synthetic data for reducing labeling costs and enhancing generalization. The LT subset further reveals a trade-off between underrepresented and well-represented classes, highlighting the need for balanced transfer learning. The dataset and accompanying tools are released publicly to enable further exploration of synthetic-to-real transfer and distribution-aware data generation in LiDAR perception.

Abstract

Semantic segmentation on LiDAR imaging is increasingly gaining attention, as it can provide useful knowledge for perception systems and potential for autonomous driving. However, collecting and labeling real LiDAR data is an expensive and time-consuming task. While datasets such as SemanticKITTI have been manually collected and labeled, the introduction of simulation tools such as CARLA, has enabled the creation of synthetic datasets on demand. In this work, we present a modified CARLA simulator designed with LiDAR semantic segmentation in mind, with new classes, more consistent object labeling with their counterparts from real datasets such as SemanticKITTI, and the possibility to adjust the object class distribution. Using this tool, we have generated SynthmanticLiDAR, a synthetic dataset for semantic segmentation on LiDAR imaging, designed to be similar to SemanticKITTI, and we evaluate its contribution to the training process of different semantic segmentation algorithms by using a naive transfer learning approach. Our results show that incorporating SynthmanticLiDAR into the training process improves the overall performance of tested algorithms, proving the usefulness of our dataset, and therefore, our adapted CARLA simulator. The dataset and simulator are available in https://github.com/vpulab/SynthmanticLiDAR.

Paper Structure

This paper contains 9 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Examples of synthetic semantic LiDAR scans from SynthmanticLiDAR and an RGB image of the captured scene.
  • Figure 2: Proportion of labeled points in the dataset for classes shared between SemanticKITTI and our dataset, SynthmanticLiDAR. Logarithmic to visualize underrepresented classes.
  • Figure 3: Scheme followed when training the point cloud semantic segmentation models. First, we pre-trained models using one of the two versions of our synthetic dataset, and then we fine-tuned them using the real data from SemanticKITTI.
  • Figure 4: IoU scores for different versions of the SPVCNN algorithm, represented as a percentage increment over the baseline model.
  • Figure 5: IoU scores for different versions of the SqueezeSegV3 algorithm represented as a percentage increment over the baseline model. Small increments result in a large percentage increase in low-performance classes.