Table of Contents
Fetching ...

V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

Lei Yang, Xinyu Zhang, Jun Li, Chen Wang, Jiaqi Ma, Zhiying Song, Tong Zhao, Ziying Song, Li Wang, Mo Zhou, Yang Shen, Kai Wu, Chen Lv

TL;DR

V2X-Radar tackles occlusion and limited perception in autonomous driving by introducing the first large-scale real-world multi-modal dataset with 4D Radar for cooperative perception. The dataset comprises three sub-datasets (V2X-Radar-C/I/V) collected from a connected vehicle and roadside unit equipped with LiDAR, cameras, and 4D Radar, and includes 350K annotations across five object categories, plus extensive benchmark results for single-agent, roadside, and cooperative perception. A key finding is that asynchronous communication significantly degrades cooperative perception performance, while 4D Radar—especially with Doppler information—provides robustness under adverse weather and complements LiDAR/camera data. Overall, V2X-Radar enables development and evaluation of delay-tolerant, cross-modal fusion strategies and paves the way for more robust V2X perception in real-world conditions, with publicly released data and code for researchers. AP at IoU thresholds $0.5$ and $0.7$ are used for evaluation across tasks, and time synchronization is maintained within $<20$ ms to support fair cross-platform comparisons.

Abstract

Modern autonomous vehicle perception systems often struggle with occlusions and limited perception range. Previous studies have demonstrated the effectiveness of cooperative perception in extending the perception range and overcoming occlusions, thereby enhancing the safety of autonomous driving. In recent years, a series of cooperative perception datasets have emerged; however, these datasets primarily focus on cameras and LiDAR, neglecting 4D Radar, a sensor used in single-vehicle autonomous driving to provide robust perception in adverse weather conditions. In this paper, to bridge the gap created by the absence of 4D Radar datasets in cooperative perception, we present V2X-Radar, the first large-scale, real-world multi-modal dataset featuring 4D Radar. V2X-Radar dataset is collected using a connected vehicle platform and an intelligent roadside unit equipped with 4D Radar, LiDAR, and multi-view cameras. The collected data encompasses sunny and rainy weather conditions, spanning daytime, dusk, and nighttime, as well as various typical challenging scenarios. The dataset consists of 20K LiDAR frames, 40K camera images, and 20K 4D Radar data, including 350K annotated boxes across five categories. To support various research domains, we have established V2X-Radar-C for cooperative perception, V2X-Radar-I for roadside perception, and V2X-Radar-V for single-vehicle perception. Furthermore, we provide comprehensive benchmarks across these three sub-datasets. We will release all datasets and benchmark codebase at https://huggingface.co/datasets/yanglei18/V2X-Radar and https://github.com/yanglei18/V2X-Radar.

V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

TL;DR

V2X-Radar tackles occlusion and limited perception in autonomous driving by introducing the first large-scale real-world multi-modal dataset with 4D Radar for cooperative perception. The dataset comprises three sub-datasets (V2X-Radar-C/I/V) collected from a connected vehicle and roadside unit equipped with LiDAR, cameras, and 4D Radar, and includes 350K annotations across five object categories, plus extensive benchmark results for single-agent, roadside, and cooperative perception. A key finding is that asynchronous communication significantly degrades cooperative perception performance, while 4D Radar—especially with Doppler information—provides robustness under adverse weather and complements LiDAR/camera data. Overall, V2X-Radar enables development and evaluation of delay-tolerant, cross-modal fusion strategies and paves the way for more robust V2X perception in real-world conditions, with publicly released data and code for researchers. AP at IoU thresholds and are used for evaluation across tasks, and time synchronization is maintained within ms to support fair cross-platform comparisons.

Abstract

Modern autonomous vehicle perception systems often struggle with occlusions and limited perception range. Previous studies have demonstrated the effectiveness of cooperative perception in extending the perception range and overcoming occlusions, thereby enhancing the safety of autonomous driving. In recent years, a series of cooperative perception datasets have emerged; however, these datasets primarily focus on cameras and LiDAR, neglecting 4D Radar, a sensor used in single-vehicle autonomous driving to provide robust perception in adverse weather conditions. In this paper, to bridge the gap created by the absence of 4D Radar datasets in cooperative perception, we present V2X-Radar, the first large-scale, real-world multi-modal dataset featuring 4D Radar. V2X-Radar dataset is collected using a connected vehicle platform and an intelligent roadside unit equipped with 4D Radar, LiDAR, and multi-view cameras. The collected data encompasses sunny and rainy weather conditions, spanning daytime, dusk, and nighttime, as well as various typical challenging scenarios. The dataset consists of 20K LiDAR frames, 40K camera images, and 20K 4D Radar data, including 350K annotated boxes across five categories. To support various research domains, we have established V2X-Radar-C for cooperative perception, V2X-Radar-I for roadside perception, and V2X-Radar-V for single-vehicle perception. Furthermore, we provide comprehensive benchmarks across these three sub-datasets. We will release all datasets and benchmark codebase at https://huggingface.co/datasets/yanglei18/V2X-Radar and https://github.com/yanglei18/V2X-Radar.

Paper Structure

This paper contains 22 sections, 10 figures, 10 tables.

Figures (10)

  • Figure 1: A data frame sampled from the V2X-Radar dataset. Each sample includes data from three sensors: (1) dense point clouds (gray points) from roadside and vehicle-side LiDAR; (2) sparse point clouds (green and blue points) with Doppler information from the roadside and vehicle-mounted 4D Radar; (3) RGB images (top row) from the vehicle-side camera and multi-view roadside cameras. All sensors are temporally and spatially synchronized. Each data frame is manually annotated with 3D boxes across five categories.
  • Figure 2: The sensor configuration on the connected vehicle-side platform and the intelligent roadside unit. a) the vehicle-side platform, and b) the intelligent roadside unit or infrastructure unit. Both are equipped with multi-modal sensors, including cameras, LiDAR, and 4D Radar, along with a C-V2X unit and a GPS/IMU system.
  • Figure 3: Visualization of calibration results. a) The calibration results between the camera and LiDAR. b) The calibration results between the 4D Radar and LiDAR / Camera. The LiDAR points are projected onto the camera plane using the camera's intrinsic parameters and the camera-LiDAR extrinsics. Similarly, the 4D Radar points are transferred to the LiDAR coordinate system using the 4D Radar-LiDAR extrinsics. Additionally, these 4D Radar points are also mapped onto the camera plane by employing the camera's intrinsic parameters, along with the extrinsic parameters.
  • Figure 4: Visualization of point cloud registration results. a) Initial point cloud registration based on RTK localization. b) Refined point cloud registration with CBM song2024spatial and manual adjustment. The blue points represent the point cloud from the vehicle-side LiDAR, while the green points indicate the point cloud from the roadside LiDAR.
  • Figure 5: Data analysis of our V2X-Radar dataset. a) Distribution of objects during day and night conditions. b) Average and maximum number of LiDAR points within the 3D bounding box for each category. c) Average and maximum number of 4D Radar points within the 3D bounding box for each category. d) Number of annotations per collaborative sample. The vertical axes of sub-plots (a)-(c) use a log scale, whilst the vertical axis of sub-plot (d) employs a standard scale.
  • ...and 5 more figures