Table of Contents
Fetching ...

JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data

Runjian Chen, Wenqi Shao, Bo Zhang, Shaoshuai Shi, Li Jiang, Ping Luo

TL;DR

JiSAM tackles the labeling bottleneck and corner-case gaps in LiDAR-based autonomous driving perception by introducing three plug-and-play components: jittering augmentation to boost simulation data diversity, a domain-aware backbone to exploit domain-specific input channels, and a memory-based sectorized alignment loss to bridge sim-to-real gaps. By jointly training with大量 synthetic data from CARLA and only a small fraction of real labeled data, JiSAM achieves comparable performance to models trained on the full real dataset and substantially improves detection of unlabeled corner cases (e.g., motorcycles). The approach reduces labeling cost, enhances sample efficiency, and narrows the sim-to-real gap, facilitating closer deployment of DL-based AD perception in real-world settings. The work demonstrates practical potential for integrating simulation data into real-world 3D LiDAR perception and provides a foundation for broader adoption in the autonomous driving community, with code and models to be released.

Abstract

Deep-learning-based autonomous driving (AD) perception introduces a promising picture for safe and environment-friendly transportation. However, the over-reliance on real labeled data in LiDAR perception limits the scale of on-road attempts. 3D real world data is notoriously time-and-energy-consuming to annotate and lacks corner cases like rare traffic participants. On the contrary, in simulators like CARLA, generating labeled LiDAR point clouds with corner cases is a piece of cake. However, introducing synthetic point clouds to improve real perception is non-trivial. This stems from two challenges: 1) sample efficiency of simulation datasets 2) simulation-to-real gaps. To overcome both challenges, we propose a plug-and-play method called JiSAM , shorthand for Jittering augmentation, domain-aware backbone and memory-based Sectorized AlignMent. In extensive experiments conducted on the famous AD dataset NuScenes, we demonstrate that, with SOTA 3D object detector, JiSAM is able to utilize the simulation data and only labels on 2.5% available real data to achieve comparable performance to models trained on all real data. Additionally, JiSAM achieves more than 15 mAPs on the objects not labeled in the real training set. We will release models and codes.

JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data

TL;DR

JiSAM tackles the labeling bottleneck and corner-case gaps in LiDAR-based autonomous driving perception by introducing three plug-and-play components: jittering augmentation to boost simulation data diversity, a domain-aware backbone to exploit domain-specific input channels, and a memory-based sectorized alignment loss to bridge sim-to-real gaps. By jointly training with大量 synthetic data from CARLA and only a small fraction of real labeled data, JiSAM achieves comparable performance to models trained on the full real dataset and substantially improves detection of unlabeled corner cases (e.g., motorcycles). The approach reduces labeling cost, enhances sample efficiency, and narrows the sim-to-real gap, facilitating closer deployment of DL-based AD perception in real-world settings. The work demonstrates practical potential for integrating simulation data into real-world 3D LiDAR perception and provides a foundation for broader adoption in the autonomous driving community, with code and models to be released.

Abstract

Deep-learning-based autonomous driving (AD) perception introduces a promising picture for safe and environment-friendly transportation. However, the over-reliance on real labeled data in LiDAR perception limits the scale of on-road attempts. 3D real world data is notoriously time-and-energy-consuming to annotate and lacks corner cases like rare traffic participants. On the contrary, in simulators like CARLA, generating labeled LiDAR point clouds with corner cases is a piece of cake. However, introducing synthetic point clouds to improve real perception is non-trivial. This stems from two challenges: 1) sample efficiency of simulation datasets 2) simulation-to-real gaps. To overcome both challenges, we propose a plug-and-play method called JiSAM , shorthand for Jittering augmentation, domain-aware backbone and memory-based Sectorized AlignMent. In extensive experiments conducted on the famous AD dataset NuScenes, we demonstrate that, with SOTA 3D object detector, JiSAM is able to utilize the simulation data and only labels on 2.5% available real data to achieve comparable performance to models trained on all real data. Additionally, JiSAM achieves more than 15 mAPs on the objects not labeled in the real training set. We will release models and codes.

Paper Structure

This paper contains 12 sections, 11 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The pipeline of the proposed method. Current SOTA LiDAR detectors are mainly consisted of three parts: 3D sparse backbone for embedding 3D voxels, BEV backbone for embedding bird-eye-view features and detection head to predict 3D bounding boxes from BEV features. JiSAM jointly train simulation dataset from CARLA and a few labeled samples from real dataset. To increase sample efficiency and bridge the sim-to-real domain gap, we propose (a) jittering augmentation on noiseless simulation data, which largely increases the sample efficiency of simulation data and save the cost of training time and disk space (b) separate input embedding layer, which fully utilizes all useful information from both domain (c) memory-based sectorized alignment on BEV features to bridge the sim-to-real gap. This is inspired by the observation that in the same sector of the autonomous vehicle's neighborhood, two objects of the same category having similar heading would have similar points distribution in the LiDAR scan.
  • Figure 2: Overall results on NuScenes Dataset. The SOTA 3D detector, Transfusion transfusion is used for all the experiments. In the figure, 'SOTA' means Transfusion trained on all the available labels in NuScenes dataset. 'SOTA with fewer labels' means Transfusion trained on 2.5% of all the LiDAR point clouds in the training set. 4 mAP and 3 NDS drops are observed. 'Ours' means JiSAM that utilizes only 7,000 labeled real LiDAR point clouds and simulation point clouds to train Transfusion. It can be found that JiSAM improves the performance of 'SOTA with fewer labels' by a significant margin and achieves comparable performance with Transfusion trained on all the available labels in NuScenes dataset.
  • Figure 3: Results on corner cases study. We manually eliminate the labels of motorcycle in the real training set to simulate the scene where corner labels (motorcycle) only exists in the evaluation set. The SOTA 3D detector, Transfusion transfusion is used for all the experiments. In the figure, 'SOTA' means Transfusion trained on all the available labels in NuScenes dataset and 'Ours' indicates training Transfusion with JiSAM with simulation data and 2.5% of all the LiDAR point clouds in the training set. We present results of mAP and APs for some main categories. It can be found that even without labels of motorcycle in real dataset, JiSAM achieves approximately 16% mAP on this category on evaluation dataset and a bit better overall mAP. Meanwhile, for other categories like car and pedestrian, the performance are comparable (difference lower than 0.5% APs).