SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset
Goodarz Mehr, Azim Eskandarian
TL;DR
SimBEV tackles the BEV perception data bottleneck by offering a randomized synthetic data generator built on CARLA and producing BEV ground truth for BEV segmentation and 3D object detection. It uses domain randomization to create diverse, multi-sensor scenes and computes BEV ground truth by fusing data from overhead and underground views, stored on a $360\times360$ BEV grid with cell size $0.4\,\mathrm{m}$. The SimBEV dataset comprises 102,400 annotated frames across 11 CARLA maps, including 8.3 million 3D bounding boxes and 2.79 billion BEV labels, and is openly accessible. The results suggest fusion-based perception substantially outperforms camera-only baselines, validating the value of synthetic, multi-sensor BEV data for benchmarking and domain adaptation.
Abstract
Bird's-eye view (BEV) perception has garnered significant attention in autonomous driving in recent years, in part because BEV representation facilitates multi-modal sensor fusion. BEV representation enables a variety of perception tasks including BEV segmentation, a concise view of the environment useful for planning a vehicle's trajectory. However, this representation is not fully supported by existing datasets, and creation of new datasets for this purpose can be a time-consuming endeavor. To address this challenge, we introduce SimBEV. SimBEV is a randomized synthetic data generation tool that is extensively configurable and scalable, supports a wide array of sensors, incorporates information from multiple sources to capture accurate BEV ground truth, and enables a variety of perception tasks including BEV segmentation and 3D object detection. SimBEV is used to create the SimBEV dataset, a large collection of annotated perception data from diverse driving scenarios. SimBEV and the SimBEV dataset are open and available to the public.
