Table of Contents
Fetching ...

V2U4Real: A Real-world Large-scale Dataset for Vehicle-to-UAV Cooperative Perception

Weijia Li, Haoen Xiang, Tianxu Wang, Shuaibing Wu, Qiming Xia, Cheng Wang, Chenglu Wen

Abstract

Modern autonomous vehicle perception systems are often constrained by occlusions, blind spots, and limited sensing range. While existing cooperative perception paradigms, such as Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I), have demonstrated their effectiveness in mitigating these challenges, they remain limited to ground-level collaboration and cannot fully address large-scale occlusions or long-range perception in complex environments. To advance research in cross-view cooperative perception, we present V2U4Real, the first large-scale real-world multi-modal dataset for Vehicle-to-UAV (V2U) cooperative object perception. V2U4Real is collected by a ground vehicle and a UAV equipped with multi-view LiDARs and RGB cameras. The dataset covers urban streets, university campuses, and rural roads under diverse traffic scenarios, comprising over 56K LiDAR frames, 56K multi-view camera images, and 700K annotated 3D bounding boxes across four classes. To support a wide range of research tasks, we establish benchmarks for single-agent 3D object detection, cooperative 3D object detection, and object tracking. Comprehensive evaluations of several state-of-the-art models demonstrate the effectiveness of V2U cooperation in enhancing perception robustness and long-range awareness. The V2U4Real dataset and codebase is available at https://github.com/VjiaLi/V2U4Real.

V2U4Real: A Real-world Large-scale Dataset for Vehicle-to-UAV Cooperative Perception

Abstract

Modern autonomous vehicle perception systems are often constrained by occlusions, blind spots, and limited sensing range. While existing cooperative perception paradigms, such as Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I), have demonstrated their effectiveness in mitigating these challenges, they remain limited to ground-level collaboration and cannot fully address large-scale occlusions or long-range perception in complex environments. To advance research in cross-view cooperative perception, we present V2U4Real, the first large-scale real-world multi-modal dataset for Vehicle-to-UAV (V2U) cooperative object perception. V2U4Real is collected by a ground vehicle and a UAV equipped with multi-view LiDARs and RGB cameras. The dataset covers urban streets, university campuses, and rural roads under diverse traffic scenarios, comprising over 56K LiDAR frames, 56K multi-view camera images, and 700K annotated 3D bounding boxes across four classes. To support a wide range of research tasks, we establish benchmarks for single-agent 3D object detection, cooperative 3D object detection, and object tracking. Comprehensive evaluations of several state-of-the-art models demonstrate the effectiveness of V2U cooperation in enhancing perception robustness and long-range awareness. The V2U4Real dataset and codebase is available at https://github.com/VjiaLi/V2U4Real.

Paper Structure

This paper contains 37 sections, 1 equation, 12 figures, 12 tables.

Figures (12)

  • Figure 1: An example data frame from V2U4Real. (a) Ground-view aggregated LiDAR data. (b) Aerial-view aggregated LiDAR data. (c) V2U cooperation paradigm, where the ground vehicle serves as the ego. Purple points are captured by the ground vehicle. Red points are captured by the UAV. Green bounding boxes are the ground truth. More qualitative examples are provided in the supplementary material.
  • Figure 2: Agent motion discrepancies between ground vehicle and UAV. (a) Probability distribution of the roll angle ($\theta_{r}$). (b) Probability distribution of the pitch angle ($\theta_{p}$). Red color denotes the UAV. Purple color denotes the ego vehicle. $P$ denotes probability density.
  • Figure 3: Sensor overview of V2U4Real and coordinate of sensors. The x,y, and z coordinates are red, green, and blue.
  • Figure 4: Driving routes of two data collection agents. Urban, rural, and campus roads are represented by blue, yellow, and red lines, respectively.
  • Figure 5: Visualization of sensor calibration and point cloud registration results.Purple points are captured by the ground vehicle. Red points are captured by the UAV. Green bounding box is the ground truth.
  • ...and 7 more figures