Table of Contents
Fetching ...

AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios

Yunhao Hou, Bochao Zou, Min Zhang, Ran Chen, Shangdong Yang, Yanmei Zhang, Junbao Zhuo, Siheng Chen, Jiansheng Chen, Huimin Ma

TL;DR

AGC-Drive introduces the first real-world aerial-ground cooperative perception dataset for driving, featuring two ground vehicles and a UAV equipped with LiDAR and cameras. The dataset comprises about 80K LiDAR frames and 360K images across 14 scenarios, organized into AGC-V2V and AGC-VUC sub-collections, with 350 sequences and 13 object categories annotated with 9-DoF boxes. It provides standardized benchmarks for V2V and VUC 3D object detection, using BEV fusion-based baselines and a dedicated Delta_UAV metric to quantify UAV impact. The authors release an open-source toolkit for spatiotemporal alignment, multi-agent visualization, and collaborative annotation, enabling robust evaluation of aerial-ground perception under real-world time delays and pose errors. This dataset advances practical research in multi-agent perception, occlusion handling, and long-range detection, while emphasizing responsible use and future expansion to more complex, multi-UAV scenarios.

Abstract

By sharing information across multiple agents, collaborative perception helps autonomous vehicles mitigate occlusions and improve overall perception accuracy. While most previous work focus on vehicle-to-vehicle and vehicle-to-infrastructure collaboration, with limited attention to aerial perspectives provided by UAVs, which uniquely offer dynamic, top-down views to alleviate occlusions and monitor large-scale interactive environments. A major reason for this is the lack of high-quality datasets for aerial-ground collaborative scenarios. To bridge this gap, we present AGC-Drive, the first large-scale real-world dataset for Aerial-Ground Cooperative 3D perception. The data collection platform consists of two vehicles, each equipped with five cameras and one LiDAR sensor, and one UAV carrying a forward-facing camera and a LiDAR sensor, enabling comprehensive multi-view and multi-agent perception. Consisting of approximately 80K LiDAR frames and 360K images, the dataset covers 14 diverse real-world driving scenarios, including urban roundabouts, highway tunnels, and on/off ramps. Notably, 17% of the data comprises dynamic interaction events, including vehicle cut-ins, cut-outs, and frequent lane changes. AGC-Drive contains 350 scenes, each with approximately 100 frames and fully annotated 3D bounding boxes covering 13 object categories. We provide benchmarks for two 3D perception tasks: vehicle-to-vehicle collaborative perception and vehicle-to-UAV collaborative perception. Additionally, we release an open-source toolkit, including spatiotemporal alignment verification tools, multi-agent visualization systems, and collaborative annotation utilities. The dataset and code are available at https://github.com/PercepX/AGC-Drive.

AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios

TL;DR

AGC-Drive introduces the first real-world aerial-ground cooperative perception dataset for driving, featuring two ground vehicles and a UAV equipped with LiDAR and cameras. The dataset comprises about 80K LiDAR frames and 360K images across 14 scenarios, organized into AGC-V2V and AGC-VUC sub-collections, with 350 sequences and 13 object categories annotated with 9-DoF boxes. It provides standardized benchmarks for V2V and VUC 3D object detection, using BEV fusion-based baselines and a dedicated Delta_UAV metric to quantify UAV impact. The authors release an open-source toolkit for spatiotemporal alignment, multi-agent visualization, and collaborative annotation, enabling robust evaluation of aerial-ground perception under real-world time delays and pose errors. This dataset advances practical research in multi-agent perception, occlusion handling, and long-range detection, while emphasizing responsible use and future expansion to more complex, multi-UAV scenarios.

Abstract

By sharing information across multiple agents, collaborative perception helps autonomous vehicles mitigate occlusions and improve overall perception accuracy. While most previous work focus on vehicle-to-vehicle and vehicle-to-infrastructure collaboration, with limited attention to aerial perspectives provided by UAVs, which uniquely offer dynamic, top-down views to alleviate occlusions and monitor large-scale interactive environments. A major reason for this is the lack of high-quality datasets for aerial-ground collaborative scenarios. To bridge this gap, we present AGC-Drive, the first large-scale real-world dataset for Aerial-Ground Cooperative 3D perception. The data collection platform consists of two vehicles, each equipped with five cameras and one LiDAR sensor, and one UAV carrying a forward-facing camera and a LiDAR sensor, enabling comprehensive multi-view and multi-agent perception. Consisting of approximately 80K LiDAR frames and 360K images, the dataset covers 14 diverse real-world driving scenarios, including urban roundabouts, highway tunnels, and on/off ramps. Notably, 17% of the data comprises dynamic interaction events, including vehicle cut-ins, cut-outs, and frequent lane changes. AGC-Drive contains 350 scenes, each with approximately 100 frames and fully annotated 3D bounding boxes covering 13 object categories. We provide benchmarks for two 3D perception tasks: vehicle-to-vehicle collaborative perception and vehicle-to-UAV collaborative perception. Additionally, we release an open-source toolkit, including spatiotemporal alignment verification tools, multi-agent visualization systems, and collaborative annotation utilities. The dataset and code are available at https://github.com/PercepX/AGC-Drive.

Paper Structure

This paper contains 10 sections, 1 equation, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Collaborative data collection with two vehicles and a UAV. Each vehicle is equipped with one LiDAR and five cameras. The UAV carries a LiDAR and a camera system. The top-right inset shows the custom UAV sensor setup, and the bottom-right inset illustrates the vehicle’s sensor layout.
  • Figure 2: Distribution of Driving Environment and Scenario Types.
  • Figure 3: (a) Residual heatmap of registration results (blue: high accuracy, red: low accuracy). (b) Zoomed view of the dense object region in (a). (c) Scatter plot of average residuals for 4000 randomly sampled points after registering both drone and vehicle point clouds to the ego vehicle.
  • Figure 4: Point cloud visualization of V2V cooperative object detection results on the AGC-V2V dataset. Green bounding boxes denote predicted objects, and red bounding boxes indicate ground truth annotations.
  • Figure 5: Point cloud visualization of VUC object detection results on the AGC-VUC dataset. Green bounding boxes denote predicted objects, and red bounding boxes indicate ground truth annotations.