JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data
Runjian Chen, Wenqi Shao, Bo Zhang, Shaoshuai Shi, Li Jiang, Ping Luo
TL;DR
JiSAM tackles the labeling bottleneck and corner-case gaps in LiDAR-based autonomous driving perception by introducing three plug-and-play components: jittering augmentation to boost simulation data diversity, a domain-aware backbone to exploit domain-specific input channels, and a memory-based sectorized alignment loss to bridge sim-to-real gaps. By jointly training with大量 synthetic data from CARLA and only a small fraction of real labeled data, JiSAM achieves comparable performance to models trained on the full real dataset and substantially improves detection of unlabeled corner cases (e.g., motorcycles). The approach reduces labeling cost, enhances sample efficiency, and narrows the sim-to-real gap, facilitating closer deployment of DL-based AD perception in real-world settings. The work demonstrates practical potential for integrating simulation data into real-world 3D LiDAR perception and provides a foundation for broader adoption in the autonomous driving community, with code and models to be released.
Abstract
Deep-learning-based autonomous driving (AD) perception introduces a promising picture for safe and environment-friendly transportation. However, the over-reliance on real labeled data in LiDAR perception limits the scale of on-road attempts. 3D real world data is notoriously time-and-energy-consuming to annotate and lacks corner cases like rare traffic participants. On the contrary, in simulators like CARLA, generating labeled LiDAR point clouds with corner cases is a piece of cake. However, introducing synthetic point clouds to improve real perception is non-trivial. This stems from two challenges: 1) sample efficiency of simulation datasets 2) simulation-to-real gaps. To overcome both challenges, we propose a plug-and-play method called JiSAM , shorthand for Jittering augmentation, domain-aware backbone and memory-based Sectorized AlignMent. In extensive experiments conducted on the famous AD dataset NuScenes, we demonstrate that, with SOTA 3D object detector, JiSAM is able to utilize the simulation data and only labels on 2.5% available real data to achieve comparable performance to models trained on all real data. Additionally, JiSAM achieves more than 15 mAPs on the objects not labeled in the real training set. We will release models and codes.
