Table of Contents
Fetching ...

Car-STAGE: Automated framework for large-scale high-dimensional simulated time-series data generation based on user-defined criteria

Asma A. Almutairi, David J. LeBlanc, Arpan Kusari

TL;DR

Car-STAGE introduces a GUI-driven framework built on CARLA to generate large-scale, synchronized multi-sensor time-series data with ground-truth annotations. It replaces the native single-thread CARLA workflow with a synchronous, multi-threaded pipeline and memory-mapped I/O, enabling background data collection, deterministic timing, and scalable throughput. Key contributions include the STAGE-VO visibility annotation algorithm, a 12-module architecture, and empirical speedups over CARLA across frames, cameras, and LiDARs. The approach has practical impact for autonomous driving research by simplifying large-scale data generation with consistent ground-truth labeling and enabling cloud-based storage and analysis.

Abstract

Generating large-scale sensing datasets through photo-realistic simulation is an important aspect of many robotics applications such as autonomous driving. In this paper, we consider the problem of synchronous data collection from the open-source CARLA simulator using multiple sensors attached to vehicle based on user-defined criteria. We propose a novel, one-step framework that we refer to as Car-STAGE, based on CARLA simulator, to generate data using a graphical user interface (GUI) defining configuration parameters to data collection without any user intervention. This framework can utilize the user-defined configuration parameters such as choice of maps, number and configurations of sensors, environmental and lighting conditions etc. to run the simulation in the background, collecting high-dimensional sensor data from diverse sensors such as RGB Camera, LiDAR, Radar, Depth Camera, IMU Sensor, GNSS Sensor, Semantic Segmentation Camera, Instance Segmentation Camera, and Optical Flow Camera along with the ground-truths of the individual actors and storing the sensor data as well as ground-truth labels in a local or cloud-based database. The framework uses multiple threads where a main thread runs the server, a worker thread deals with queue and frame number and the rest of the threads processes the sensor data. The other way we derive speed up over the native implementation is by memory mapping the raw binary data into the disk and then converting the data into known formats at the end of data collection. We show that using these techniques, we gain a significant speed up over frames, under an increasing set of sensors and over the number of spawned objects.

Car-STAGE: Automated framework for large-scale high-dimensional simulated time-series data generation based on user-defined criteria

TL;DR

Car-STAGE introduces a GUI-driven framework built on CARLA to generate large-scale, synchronized multi-sensor time-series data with ground-truth annotations. It replaces the native single-thread CARLA workflow with a synchronous, multi-threaded pipeline and memory-mapped I/O, enabling background data collection, deterministic timing, and scalable throughput. Key contributions include the STAGE-VO visibility annotation algorithm, a 12-module architecture, and empirical speedups over CARLA across frames, cameras, and LiDARs. The approach has practical impact for autonomous driving research by simplifying large-scale data generation with consistent ground-truth labeling and enabling cloud-based storage and analysis.

Abstract

Generating large-scale sensing datasets through photo-realistic simulation is an important aspect of many robotics applications such as autonomous driving. In this paper, we consider the problem of synchronous data collection from the open-source CARLA simulator using multiple sensors attached to vehicle based on user-defined criteria. We propose a novel, one-step framework that we refer to as Car-STAGE, based on CARLA simulator, to generate data using a graphical user interface (GUI) defining configuration parameters to data collection without any user intervention. This framework can utilize the user-defined configuration parameters such as choice of maps, number and configurations of sensors, environmental and lighting conditions etc. to run the simulation in the background, collecting high-dimensional sensor data from diverse sensors such as RGB Camera, LiDAR, Radar, Depth Camera, IMU Sensor, GNSS Sensor, Semantic Segmentation Camera, Instance Segmentation Camera, and Optical Flow Camera along with the ground-truths of the individual actors and storing the sensor data as well as ground-truth labels in a local or cloud-based database. The framework uses multiple threads where a main thread runs the server, a worker thread deals with queue and frame number and the rest of the threads processes the sensor data. The other way we derive speed up over the native implementation is by memory mapping the raw binary data into the disk and then converting the data into known formats at the end of data collection. We show that using these techniques, we gain a significant speed up over frames, under an increasing set of sensors and over the number of spawned objects.

Paper Structure

This paper contains 18 sections, 3 equations, 6 figures.

Figures (6)

  • Figure 1: Screenshot of Car-STAGE GUI with the key parameters on the left and senors placement feature on the right
  • Figure 2: Schematic representation of the various Car-STAGE modules
  • Figure 3: Bounding box measurements in CARLA. X,Y, and Z are center of bounding box. While extent x, extent y, extent z correspond to half the dimensions of the bounding box along the respective x, y, and z axes. (Image source: CARLA)
  • Figure 4: Comparison of total time (in sec) for CARLA (above) and Car-STAGE (below) over the number of frames
  • Figure 5: Comparison of total time (in sec) for CARLA and Car-STAGE as a function of the number of cameras
  • ...and 1 more figures