AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios
Yunhao Hou, Bochao Zou, Min Zhang, Ran Chen, Shangdong Yang, Yanmei Zhang, Junbao Zhuo, Siheng Chen, Jiansheng Chen, Huimin Ma
TL;DR
AGC-Drive introduces the first real-world aerial-ground cooperative perception dataset for driving, featuring two ground vehicles and a UAV equipped with LiDAR and cameras. The dataset comprises about 80K LiDAR frames and 360K images across 14 scenarios, organized into AGC-V2V and AGC-VUC sub-collections, with 350 sequences and 13 object categories annotated with 9-DoF boxes. It provides standardized benchmarks for V2V and VUC 3D object detection, using BEV fusion-based baselines and a dedicated Delta_UAV metric to quantify UAV impact. The authors release an open-source toolkit for spatiotemporal alignment, multi-agent visualization, and collaborative annotation, enabling robust evaluation of aerial-ground perception under real-world time delays and pose errors. This dataset advances practical research in multi-agent perception, occlusion handling, and long-range detection, while emphasizing responsible use and future expansion to more complex, multi-UAV scenarios.
Abstract
By sharing information across multiple agents, collaborative perception helps autonomous vehicles mitigate occlusions and improve overall perception accuracy. While most previous work focus on vehicle-to-vehicle and vehicle-to-infrastructure collaboration, with limited attention to aerial perspectives provided by UAVs, which uniquely offer dynamic, top-down views to alleviate occlusions and monitor large-scale interactive environments. A major reason for this is the lack of high-quality datasets for aerial-ground collaborative scenarios. To bridge this gap, we present AGC-Drive, the first large-scale real-world dataset for Aerial-Ground Cooperative 3D perception. The data collection platform consists of two vehicles, each equipped with five cameras and one LiDAR sensor, and one UAV carrying a forward-facing camera and a LiDAR sensor, enabling comprehensive multi-view and multi-agent perception. Consisting of approximately 80K LiDAR frames and 360K images, the dataset covers 14 diverse real-world driving scenarios, including urban roundabouts, highway tunnels, and on/off ramps. Notably, 17% of the data comprises dynamic interaction events, including vehicle cut-ins, cut-outs, and frequent lane changes. AGC-Drive contains 350 scenes, each with approximately 100 frames and fully annotated 3D bounding boxes covering 13 object categories. We provide benchmarks for two 3D perception tasks: vehicle-to-vehicle collaborative perception and vehicle-to-UAV collaborative perception. Additionally, we release an open-source toolkit, including spatiotemporal alignment verification tools, multi-agent visualization systems, and collaborative annotation utilities. The dataset and code are available at https://github.com/PercepX/AGC-Drive.
