Table of Contents
Fetching ...

SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset

Goodarz Mehr, Azim Eskandarian

TL;DR

SimBEV tackles the BEV perception data bottleneck by offering a randomized synthetic data generator built on CARLA and producing BEV ground truth for BEV segmentation and 3D object detection. It uses domain randomization to create diverse, multi-sensor scenes and computes BEV ground truth by fusing data from overhead and underground views, stored on a $360\times360$ BEV grid with cell size $0.4\,\mathrm{m}$. The SimBEV dataset comprises 102,400 annotated frames across 11 CARLA maps, including 8.3 million 3D bounding boxes and 2.79 billion BEV labels, and is openly accessible. The results suggest fusion-based perception substantially outperforms camera-only baselines, validating the value of synthetic, multi-sensor BEV data for benchmarking and domain adaptation.

Abstract

Bird's-eye view (BEV) perception has garnered significant attention in autonomous driving in recent years, in part because BEV representation facilitates multi-modal sensor fusion. BEV representation enables a variety of perception tasks including BEV segmentation, a concise view of the environment useful for planning a vehicle's trajectory. However, this representation is not fully supported by existing datasets, and creation of new datasets for this purpose can be a time-consuming endeavor. To address this challenge, we introduce SimBEV. SimBEV is a randomized synthetic data generation tool that is extensively configurable and scalable, supports a wide array of sensors, incorporates information from multiple sources to capture accurate BEV ground truth, and enables a variety of perception tasks including BEV segmentation and 3D object detection. SimBEV is used to create the SimBEV dataset, a large collection of annotated perception data from diverse driving scenarios. SimBEV and the SimBEV dataset are open and available to the public.

SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset

TL;DR

SimBEV tackles the BEV perception data bottleneck by offering a randomized synthetic data generator built on CARLA and producing BEV ground truth for BEV segmentation and 3D object detection. It uses domain randomization to create diverse, multi-sensor scenes and computes BEV ground truth by fusing data from overhead and underground views, stored on a BEV grid with cell size . The SimBEV dataset comprises 102,400 annotated frames across 11 CARLA maps, including 8.3 million 3D bounding boxes and 2.79 billion BEV labels, and is openly accessible. The results suggest fusion-based perception substantially outperforms camera-only baselines, validating the value of synthetic, multi-sensor BEV data for benchmarking and domain adaptation.

Abstract

Bird's-eye view (BEV) perception has garnered significant attention in autonomous driving in recent years, in part because BEV representation facilitates multi-modal sensor fusion. BEV representation enables a variety of perception tasks including BEV segmentation, a concise view of the environment useful for planning a vehicle's trajectory. However, this representation is not fully supported by existing datasets, and creation of new datasets for this purpose can be a time-consuming endeavor. To address this challenge, we introduce SimBEV. SimBEV is a randomized synthetic data generation tool that is extensively configurable and scalable, supports a wide array of sensors, incorporates information from multiple sources to capture accurate BEV ground truth, and enables a variety of perception tasks including BEV segmentation and 3D object detection. SimBEV is used to create the SimBEV dataset, a large collection of annotated perception data from diverse driving scenarios. SimBEV and the SimBEV dataset are open and available to the public.

Paper Structure

This paper contains 25 sections, 3 equations, 17 figures, 13 tables.

Figures (17)

  • Figure 1: A data sample generated by SimBEV. The left half depicts a 360-degree view of the ego vehicle's surroundings using different camera types (from top to bottom RGB, sematic segmentation, instance segmentation, depth, and optical flow cameras, respectively). On the right half, views of lidar, semantic lidar, radar, and the BEV ground truth are shown from top to bottom, respectively. Some images also contain 3D object bounding boxes colored according to the object's class.
  • Figure 2: SimBEV's logic flow when creating a new dataset. The arrow exiting green nodes at the top indicates the action taken when the condition in that node is no longer satisfied.
  • Figure 3: In a scene generated by SimBEV, a reckless ego vehicle runs over a cyclist.
  • Figure 4: Ground elements (roads, sidewalks, etc.) in CARLA use one-way visible materials, appearing invisible to a camera placed below them. We use this property to capture accurate BEV ground truth by placing a camera below the ego vehicle looking up.
  • Figure 5: Left: BEV road data calculated using CARLA-generated waypoints; there are clear gaps where lanes diverge. Middle: BEV road data obtained from the overhead camera; vehicles and vegetation obstruct a portion of the view. Right: BEV road ground truth obtained by combining the two sources and performing binray closing.
  • ...and 12 more figures