Table of Contents
Fetching ...

rareboost3d: a synthetic lidar dataset with enhanced rare classes

Shutong Lin, Zhengkang Xiang, Jianzhong Qi, Kourosh Khoshelham

TL;DR

The paper introduces RareBoost3D, a large-scale synthetic LiDAR dataset that boosts rare-class instances to alleviate long-tail imbalance in real-world datasets like SemanticKITTI. It then presents a prototype-based cross-domain semantic consistency (CSC) loss that aligns synthetic and real domain features via contrastive learning using memory-bank prototypes. Empirical results show that RareBoost3D augmentations, especially with CSC, yield improvements in rare-class IoU and modest overall mIoU gains (about 1–3%), validating the approach's effectiveness for domain-aligned data augmentation. The work demonstrates that controlled synthetic data distribution, when coupled with cross-domain representation learning, can enhance LiDAR point-cloud segmentation in imbalanced real-world settings.

Abstract

Real-world point cloud datasets have made significant contributions to the development of LiDAR-based perception technologies, such as object segmentation for autonomous driving. However, due to the limited number of instances in some rare classes, the long-tail problem remains a major challenge in existing datasets. To address this issue, we introduce a novel, synthetic point cloud dataset named RareBoost3D, which complements existing real-world datasets by providing significantly more instances for object classes that are rare in real-world datasets. To effectively leverage both synthetic and real-world data, we further propose a cross-domain semantic alignment method named CSC loss that aligns feature representations of the same class across different domains. Experimental results demonstrate that this alignment significantly enhances the performance of LiDAR point cloud segmentation models over real-world data.

rareboost3d: a synthetic lidar dataset with enhanced rare classes

TL;DR

The paper introduces RareBoost3D, a large-scale synthetic LiDAR dataset that boosts rare-class instances to alleviate long-tail imbalance in real-world datasets like SemanticKITTI. It then presents a prototype-based cross-domain semantic consistency (CSC) loss that aligns synthetic and real domain features via contrastive learning using memory-bank prototypes. Empirical results show that RareBoost3D augmentations, especially with CSC, yield improvements in rare-class IoU and modest overall mIoU gains (about 1–3%), validating the approach's effectiveness for domain-aligned data augmentation. The work demonstrates that controlled synthetic data distribution, when coupled with cross-domain representation learning, can enhance LiDAR point-cloud segmentation in imbalanced real-world settings.

Abstract

Real-world point cloud datasets have made significant contributions to the development of LiDAR-based perception technologies, such as object segmentation for autonomous driving. However, due to the limited number of instances in some rare classes, the long-tail problem remains a major challenge in existing datasets. To address this issue, we introduce a novel, synthetic point cloud dataset named RareBoost3D, which complements existing real-world datasets by providing significantly more instances for object classes that are rare in real-world datasets. To effectively leverage both synthetic and real-world data, we further propose a cross-domain semantic alignment method named CSC loss that aligns feature representations of the same class across different domains. Experimental results demonstrate that this alignment significantly enhances the performance of LiDAR point cloud segmentation models over real-world data.

Paper Structure

This paper contains 10 sections, 3 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: The numbers of instances for non-rare (car) and rare (person, bicycle, motorcycle, rider, and truck) classes in SemanticKITTI behley2019semantickitti and our RareBoost3D. SemanticKITTI exhibits significant class imbalance, with the number of car instances being roughly 5 times greater than the combined number of instances of the rare classes. In contrast, RareBoost3D has much more instances for the rare classes.