SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
Christian Birchler, Cyrill Rohrbach, Timo Kehrer, Sebastiano Panichella
TL;DR
SensoDat tackles the high cost of simulation-based self-driving car testing by providing a large, open dataset of 32,580 executed BeamNG.tech test cases with 81 sensors and time-series data. The dataset is generated via three test generators—Frenetic, FreneticV, and AmbieGen—and deployed across 14 simulation campaigns to capture sensor and trajectory data along with test outcomes. By storing results in MongoDB and offering clear setup and query instructions, SensoDat enables AI development, regression-testing research, CAN bus testing, and investigations into simulation flakiness without requiring expensive hardware. This open, reproducible resource aims to accelerate SDC research, improve test methodology evaluation, and broaden access for researchers with limited compute resources.
Abstract
Developing tools in the context of autonomous systems [22, 24 ], such as self-driving cars (SDCs), is time-consuming and costly since researchers and practitioners rely on expensive computing hardware and simulation software. We propose SensoDat, a dataset of 32,580 executed simulation-based SDC test cases generated with state-of-the-art test generators for SDCs. The dataset consists of trajectory logs and a variety of sensor data from the SDCs (e.g., rpm, wheel speed, brake thermals, transmission, etc.) represented as a time series. In total, SensoDat provides data from 81 different simulated sensors. Future research in the domain of SDCs does not necessarily depend on executing expensive test cases when using SensoDat. Furthermore, with the high amount and variety of sensor data, we think SensoDat can contribute to research, particularly for AI development, regression testing techniques for simulation-based SDC testing, flakiness in simulation, etc. Link to the dataset: https://doi.org/10.5281/zenodo.10307479
