Table of Contents
Fetching ...

MIPD: A Multi-sensory Interactive Perception Dataset for Embodied Intelligent Driving

Zhiwei Li, Tingzhen Zhang, Meihua Zhou, Dandan Tang, Pengwei Zhang, Wenzhuo Liu, Qiaoning Yang, Tianyu Shen, Kunfeng Wang, Huaping Liu

TL;DR

This paper considers multi-sensory information and proposes a multi-modal interactive perception dataset named MIPD, enabling expanding the current autonomous driving algorithm framework and supporting the research on embodied intelligent driving.

Abstract

During the process of driving, humans usually rely on multiple senses to gather information and make decisions. Analogously, in order to achieve embodied intelligence in autonomous driving, it is essential to integrate multidimensional sensory information in order to facilitate interaction with the environment. However, the current multi-modal fusion sensing schemes often neglect these additional sensory inputs, hindering the realization of fully autonomous driving. This paper considers multi-sensory information and proposes a multi-modal interactive perception dataset named MIPD, enabling expanding the current autonomous driving algorithm framework, for supporting the research on embodied intelligent driving. In addition to the conventional camera, lidar, and 4D radar data, our dataset incorporates multiple sensor inputs including sound, light intensity, vibration intensity and vehicle speed to enrich the dataset comprehensiveness. Comprising 126 consecutive sequences, many exceeding twenty seconds, MIPD features over 8,500 meticulously synchronized and annotated frames. Moreover, it encompasses many challenging scenarios, covering various road and lighting conditions. The dataset has undergone thorough experimental validation, producing valuable insights for the exploration of next-generation autonomous driving frameworks.

MIPD: A Multi-sensory Interactive Perception Dataset for Embodied Intelligent Driving

TL;DR

This paper considers multi-sensory information and proposes a multi-modal interactive perception dataset named MIPD, enabling expanding the current autonomous driving algorithm framework and supporting the research on embodied intelligent driving.

Abstract

During the process of driving, humans usually rely on multiple senses to gather information and make decisions. Analogously, in order to achieve embodied intelligence in autonomous driving, it is essential to integrate multidimensional sensory information in order to facilitate interaction with the environment. However, the current multi-modal fusion sensing schemes often neglect these additional sensory inputs, hindering the realization of fully autonomous driving. This paper considers multi-sensory information and proposes a multi-modal interactive perception dataset named MIPD, enabling expanding the current autonomous driving algorithm framework, for supporting the research on embodied intelligent driving. In addition to the conventional camera, lidar, and 4D radar data, our dataset incorporates multiple sensor inputs including sound, light intensity, vibration intensity and vehicle speed to enrich the dataset comprehensiveness. Comprising 126 consecutive sequences, many exceeding twenty seconds, MIPD features over 8,500 meticulously synchronized and annotated frames. Moreover, it encompasses many challenging scenarios, covering various road and lighting conditions. The dataset has undergone thorough experimental validation, producing valuable insights for the exploration of next-generation autonomous driving frameworks.

Paper Structure

This paper contains 16 sections, 1 equation, 7 figures, 5 tables.

Figures (7)

  • Figure 1: The configuration of our experiment platform and visualization scenarios on the data collected by different sensors. (a) shows information of each sensor coordinate system in the data acquisition platform. (b), (c), (d), (e), (f), (g) shows the results after visualizing our data.
  • Figure 2: The number of different categories of targets within each twenty-meter distance range. (a) The number of different categories of targets for the urban scenario. (b) The number of different categories of targets for the campus scenario.
  • Figure 3: Distribution of different categories of targets. (a) The share of different categories in the urban scenario. (b) The share of different categories in the campus scenario.
  • Figure 4: The number of pedestrians, cars, and bicycles changes every 20 seconds in different scenarios, with the solid line representing the urban scene and the dotted line representing the campus scene.
  • Figure 5: Comparison of vibration data in urban and campus, where 0-10 $\upmu$m is classified as Class I, 10-50 $\upmu$m as Class II, and 50+ $\upmu$m as Class III. (a) Vibration distribution (Urban); (b) Vibration intensity (Urban); (c) Vibration intensity (Campus); (d) Vibration distribution (Campus).
  • ...and 2 more figures