HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative
Cong Ma, Lei Qiao, Chengkai Zhu, Kai Liu, Zelong Kong, Qing Li, Xueqi Zhou, Yuheng Kan, Wei Wu
TL;DR
HoloVIC addresses occlusion and blind spots in roadside perception by introducing holographic intersections with synchronized multi-sensor data from both road-side and vehicle-side viewpoints. The dataset comprises 100k+ frames, 11.47M annotated 3D boxes, and cross-sensor/global IDs to build BEV trajectories, enabling five perception tasks: Mono3D, LiDAR 3D Detection, MOT, MSMOT, and VIC Perception. It provides four sensor-layouts across five intersections, along with coordinate-system calibrations and ground-truth fusion rules, and benchmarks using diverse baselines to quantify single-, multi-, and cooperative perception performance. The work demonstrates that incorporating roadside data via VIC substantially boosts detection and tracking accuracy, especially at medium ranges, and lays a foundation for future trajectory prediction and expanded VIC tasks with scalable benchmarks.
Abstract
Vehicle-to-everything (V2X) is a popular topic in the field of Autonomous Driving in recent years. Vehicle-infrastructure cooperation (VIC) becomes one of the important research area. Due to the complexity of traffic conditions such as blind spots and occlusion, it greatly limits the perception capabilities of single-view roadside sensing systems. To further enhance the accuracy of roadside perception and provide better information to the vehicle side, in this paper, we constructed holographic intersections with various layouts to build a large-scale multi-sensor holographic vehicle-infrastructure cooperation dataset, called HoloVIC. Our dataset includes 3 different types of sensors (Camera, Lidar, Fisheye) and employs 4 sensor-layouts based on the different intersections. Each intersection is equipped with 6-18 sensors to capture synchronous data. While autonomous vehicles pass through these intersections for collecting VIC data. HoloVIC contains in total on 100k+ synchronous frames from different sensors. Additionally, we annotated 3D bounding boxes based on Camera, Fisheye, and Lidar. We also associate the IDs of the same objects across different devices and consecutive frames in sequence. Based on HoloVIC, we formulated four tasks to facilitate the development of related research. We also provide benchmarks for these tasks.
