Table of Contents
Fetching ...

Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark

Jiahao Wang, Xiangyu Cao, Jiaru Zhong, Yuner Zhang, Zeyu Han, Haibao Yu, Chuang Zhang, Lei He, Shaobing Xu, Jianqiang Wang

TL;DR

Griffin addresses occlusion and limited FoV in autonomous perception by introducing a realistic aerial-ground cooperative (AGC) dataset and benchmark. It employs CARLA-AirSim co-simulation to create multi-agent scenes with drone altitudes and occlusion-aware 3D annotations, accompanied by a benchmark that evaluates detection/tracking accuracy, communication efficiency, and robustness to latency and localization noise. The work analyzes multiple fusion paradigms, revealing that instance-level fusion offers better resilience to altitude changes and perturbations, while BEV-level methods are more sensitive to pose and communication errors. These insights guide future directions toward altitude-adaptive fusion, sparse data exchange, and robust sim-to-real transfer for deployable AGC systems.

Abstract

While cooperative perception can overcome the limitations of single-vehicle systems, the practical implementation of vehicle-to-vehicle and vehicle-to-infrastructure systems is often impeded by significant economic barriers. Aerial-ground cooperation (AGC), which pairs ground vehicles with drones, presents a more economically viable and rapidly deployable alternative. However, this emerging field has been held back by a critical lack of high-quality public datasets and benchmarks. To bridge this gap, we present \textit{Griffin}, a comprehensive AGC 3D perception dataset, featuring over 250 dynamic scenes (37k+ frames). It incorporates varied drone altitudes (20-60m), diverse weather conditions, realistic drone dynamics via CARLA-AirSim co-simulation, and critical occlusion-aware 3D annotations. Accompanying the dataset is a unified benchmarking framework for cooperative detection and tracking, with protocols to evaluate communication efficiency, altitude adaptability, and robustness to communication latency, data loss and localization noise. By experiments through different cooperative paradigms, we demonstrate the effectiveness and limitations of current methods and provide crucial insights for future research. The dataset and codes are available at https://github.com/wang-jh18-SVM/Griffin.

Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark

TL;DR

Griffin addresses occlusion and limited FoV in autonomous perception by introducing a realistic aerial-ground cooperative (AGC) dataset and benchmark. It employs CARLA-AirSim co-simulation to create multi-agent scenes with drone altitudes and occlusion-aware 3D annotations, accompanied by a benchmark that evaluates detection/tracking accuracy, communication efficiency, and robustness to latency and localization noise. The work analyzes multiple fusion paradigms, revealing that instance-level fusion offers better resilience to altitude changes and perturbations, while BEV-level methods are more sensitive to pose and communication errors. These insights guide future directions toward altitude-adaptive fusion, sparse data exchange, and robust sim-to-real transfer for deployable AGC systems.

Abstract

While cooperative perception can overcome the limitations of single-vehicle systems, the practical implementation of vehicle-to-vehicle and vehicle-to-infrastructure systems is often impeded by significant economic barriers. Aerial-ground cooperation (AGC), which pairs ground vehicles with drones, presents a more economically viable and rapidly deployable alternative. However, this emerging field has been held back by a critical lack of high-quality public datasets and benchmarks. To bridge this gap, we present \textit{Griffin}, a comprehensive AGC 3D perception dataset, featuring over 250 dynamic scenes (37k+ frames). It incorporates varied drone altitudes (20-60m), diverse weather conditions, realistic drone dynamics via CARLA-AirSim co-simulation, and critical occlusion-aware 3D annotations. Accompanying the dataset is a unified benchmarking framework for cooperative detection and tracking, with protocols to evaluate communication efficiency, altitude adaptability, and robustness to communication latency, data loss and localization noise. By experiments through different cooperative paradigms, we demonstrate the effectiveness and limitations of current methods and provide crucial insights for future research. The dataset and codes are available at https://github.com/wang-jh18-SVM/Griffin.

Paper Structure

This paper contains 26 sections, 1 equation, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Motivation for aerial-ground cooperative perception. AGC provides a flexible alternative to fixed infrastructure by leveraging on-demand deployment and a unique bird's-eye view. In this example, the aerial view reveals pedestrians (red circle) occluded from the vehicle.
  • Figure 2: An example from Griffin with visualized annotation. The ground vehicle platform is equipped with four cameras and one LiDAR, while the aerial drone platform has five cameras. We also provide instance segmentation ground truth, as shown in the lower row. Bounding boxes represent annotations from cooperative perspectives, indicating that one agent should be able to ‘see’ certain occluded objects after communication with the other. We use red circles and arrows to highlight those cases.
  • Figure 3: Data collection framework.
  • Figure 4: Weather distribution of scene clips. The dataset encompasses a variety of weather and lighting conditions. Following real-world patterns, certain combinations, such as fog at noon, are intentionally rare or absent.
  • Figure 5: UAV pose distribution of Griffin-Random.
  • ...and 5 more figures