Table of Contents
Fetching ...

HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative

Cong Ma, Lei Qiao, Chengkai Zhu, Kai Liu, Zelong Kong, Qing Li, Xueqi Zhou, Yuheng Kan, Wei Wu

TL;DR

HoloVIC addresses occlusion and blind spots in roadside perception by introducing holographic intersections with synchronized multi-sensor data from both road-side and vehicle-side viewpoints. The dataset comprises 100k+ frames, 11.47M annotated 3D boxes, and cross-sensor/global IDs to build BEV trajectories, enabling five perception tasks: Mono3D, LiDAR 3D Detection, MOT, MSMOT, and VIC Perception. It provides four sensor-layouts across five intersections, along with coordinate-system calibrations and ground-truth fusion rules, and benchmarks using diverse baselines to quantify single-, multi-, and cooperative perception performance. The work demonstrates that incorporating roadside data via VIC substantially boosts detection and tracking accuracy, especially at medium ranges, and lays a foundation for future trajectory prediction and expanded VIC tasks with scalable benchmarks.

Abstract

Vehicle-to-everything (V2X) is a popular topic in the field of Autonomous Driving in recent years. Vehicle-infrastructure cooperation (VIC) becomes one of the important research area. Due to the complexity of traffic conditions such as blind spots and occlusion, it greatly limits the perception capabilities of single-view roadside sensing systems. To further enhance the accuracy of roadside perception and provide better information to the vehicle side, in this paper, we constructed holographic intersections with various layouts to build a large-scale multi-sensor holographic vehicle-infrastructure cooperation dataset, called HoloVIC. Our dataset includes 3 different types of sensors (Camera, Lidar, Fisheye) and employs 4 sensor-layouts based on the different intersections. Each intersection is equipped with 6-18 sensors to capture synchronous data. While autonomous vehicles pass through these intersections for collecting VIC data. HoloVIC contains in total on 100k+ synchronous frames from different sensors. Additionally, we annotated 3D bounding boxes based on Camera, Fisheye, and Lidar. We also associate the IDs of the same objects across different devices and consecutive frames in sequence. Based on HoloVIC, we formulated four tasks to facilitate the development of related research. We also provide benchmarks for these tasks.

HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative

TL;DR

HoloVIC addresses occlusion and blind spots in roadside perception by introducing holographic intersections with synchronized multi-sensor data from both road-side and vehicle-side viewpoints. The dataset comprises 100k+ frames, 11.47M annotated 3D boxes, and cross-sensor/global IDs to build BEV trajectories, enabling five perception tasks: Mono3D, LiDAR 3D Detection, MOT, MSMOT, and VIC Perception. It provides four sensor-layouts across five intersections, along with coordinate-system calibrations and ground-truth fusion rules, and benchmarks using diverse baselines to quantify single-, multi-, and cooperative perception performance. The work demonstrates that incorporating roadside data via VIC substantially boosts detection and tracking accuracy, especially at medium ranges, and lays a foundation for future trajectory prediction and expanded VIC tasks with scalable benchmarks.

Abstract

Vehicle-to-everything (V2X) is a popular topic in the field of Autonomous Driving in recent years. Vehicle-infrastructure cooperation (VIC) becomes one of the important research area. Due to the complexity of traffic conditions such as blind spots and occlusion, it greatly limits the perception capabilities of single-view roadside sensing systems. To further enhance the accuracy of roadside perception and provide better information to the vehicle side, in this paper, we constructed holographic intersections with various layouts to build a large-scale multi-sensor holographic vehicle-infrastructure cooperation dataset, called HoloVIC. Our dataset includes 3 different types of sensors (Camera, Lidar, Fisheye) and employs 4 sensor-layouts based on the different intersections. Each intersection is equipped with 6-18 sensors to capture synchronous data. While autonomous vehicles pass through these intersections for collecting VIC data. HoloVIC contains in total on 100k+ synchronous frames from different sensors. Additionally, we annotated 3D bounding boxes based on Camera, Fisheye, and Lidar. We also associate the IDs of the same objects across different devices and consecutive frames in sequence. Based on HoloVIC, we formulated four tasks to facilitate the development of related research. We also provide benchmarks for these tasks.
Paper Structure (33 sections, 31 equations, 11 figures, 6 tables)

This paper contains 33 sections, 31 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: An example from HoloVIC dataset: The data and annotated 3D boxes on Camera, Lidar, and BEV, the same targets from different devices are labeled with the same Global ID.
  • Figure 2: The configuration of holographic intersections: The figure illustrates three different sensors (C: Camera, L: Lidar, F: Fisheye) and four various sensor-layouts (4C+2L, 8C+2L, 12C+4F+2L, 4C+2F+2L) in holographic intersections
  • Figure 3: The coordinate systems in the HoloVIC dataset, involving all sensors on both vehicle and road sides.
  • Figure 4: The impact of different sensor-layouts on multi-sensor tracking accuracy in Int-1 validation set.
  • Figure A1: The transformation relationships between all coordinate systems in HoloVIC.
  • ...and 6 more figures