MMDVS-LF: Multi-Modal Dynamic Vision Sensor and Eye-Tracking Dataset for Line Following
Felix Resch, Mónika Farsang, Radu Grosu
TL;DR
MMDVS-LF introduces a multimodal, compact dataset for line following that combines Dynamic Vision Sensor (DVS) events with eye-tracking, RGB video, odometry, IMU, and driver demographics. It emphasizes synchronized multi-modal data and representations such as time surfaces and event frames to enable event-based deep learning for control tasks, validated through steering-prediction benchmarks and attention-map analyses. The dataset supports various resolutions and frequencies, provides detailed recording/annotation formats, and demonstrates potential for broader tasks (e.g., control, driver identification) beyond simple steering. This resource aims to promote trustworthy, interpretable, and efficient development of DVS-based models and end-to-end learning pipelines on accessible hardware like roboracer platforms.
Abstract
Dynamic Vision Sensors (DVS) offer a unique advantage in control applications due to their high temporal resolution and asynchronous event-based data. Still, their adoption in machine learning algorithms remains limited. To address this gap and promote the development of models that leverage the specific characteristics of DVS data, we introduce the MMDVS-LF: Multi-Modal Dynamic Vision Sensor and Eye-Tracking Dataset for Line Following. This comprehensive dataset is the first to integrate multiple sensor modalities, including DVS recordings and eye-tracking data from a small-scale standardized vehicle. Additionally, the dataset includes RGB video, odometry, Inertial Measurement Unit (IMU) data, and demographic data of drivers performing a Line Following. With its diverse range of data, MMDVS-LF opens new opportunities for developing event-based deep learning algorithms just like the MNIST dataset did for Convolutional Neural Networks.
