Table of Contents
Fetching ...

LAA3D: A Benchmark of Detecting and Tracking Low-Altitude Aircraft in 3D Space

Hai Wu, Shuai Tang, Jiale Wang, Longkun Zou, Mingyue Guo, Rongqin Liang, Ke Chen, Yaowei Wang

TL;DR

LAA3D tackles the scarcity of 3D perception data for low-altitude aircraft by introducing a large-scale real/synthetic dataset with 6-DoF annotations and a unified benchmark. It presents MonoLAA, a monocular 3D detector with Focal-Length Unification and Class-Specific Depth to handle varying zoom and object ranges, and demonstrates strong sim-to-real transfer via real-data fine-tuning. The work provides extensive analyses on 3D detection, 3D MOT, pose estimation, and trajectory prediction, and reveals that naive domain adaptation is insufficient while targeted real-data fine-tuning yields substantial gains. Overall, LAA3D enables robust, scalable research in outdoor, low-altitude 3D perception with practical implications for traffic management and surveillance.

Abstract

Perception of Low-Altitude Aircraft (LAA) in 3D space enables precise 3D object localization and behavior understanding. However, datasets tailored for 3D LAA perception remain scarce. To address this gap, we present LAA3D, a large-scale dataset designed to advance 3D detection and tracking of low-altitude aerial vehicles. LAA3D contains 15,000 real images and 600,000 synthetic frames, captured across diverse scenarios, including urban and suburban environments. It covers multiple aerial object categories, including electric Vertical Take-Off and Landing (eVTOL) aircraft, Micro Aerial Vehicles (MAVs), and Helicopters. Each instance is annotated with 3D bounding box, class label, and instance identity, supporting tasks such as 3D object detection, 3D multi-object tracking (MOT), and 6-DoF pose estimation. Besides, we establish the LAA3D Benchmark, integrating multiple tasks and methods with unified evaluation protocols for comparison. Furthermore, we propose MonoLAA, a monocular 3D detection baseline, achieving robust 3D localization from zoom cameras with varying focal lengths. Models pretrained on synthetic images transfer effectively to real-world data with fine-tuning, demonstrating strong sim-to-real generalization. Our LAA3D provides a comprehensive foundation for future research in low-altitude 3D object perception.

LAA3D: A Benchmark of Detecting and Tracking Low-Altitude Aircraft in 3D Space

TL;DR

LAA3D tackles the scarcity of 3D perception data for low-altitude aircraft by introducing a large-scale real/synthetic dataset with 6-DoF annotations and a unified benchmark. It presents MonoLAA, a monocular 3D detector with Focal-Length Unification and Class-Specific Depth to handle varying zoom and object ranges, and demonstrates strong sim-to-real transfer via real-data fine-tuning. The work provides extensive analyses on 3D detection, 3D MOT, pose estimation, and trajectory prediction, and reveals that naive domain adaptation is insufficient while targeted real-data fine-tuning yields substantial gains. Overall, LAA3D enables robust, scalable research in outdoor, low-altitude 3D perception with practical implications for traffic management and surveillance.

Abstract

Perception of Low-Altitude Aircraft (LAA) in 3D space enables precise 3D object localization and behavior understanding. However, datasets tailored for 3D LAA perception remain scarce. To address this gap, we present LAA3D, a large-scale dataset designed to advance 3D detection and tracking of low-altitude aerial vehicles. LAA3D contains 15,000 real images and 600,000 synthetic frames, captured across diverse scenarios, including urban and suburban environments. It covers multiple aerial object categories, including electric Vertical Take-Off and Landing (eVTOL) aircraft, Micro Aerial Vehicles (MAVs), and Helicopters. Each instance is annotated with 3D bounding box, class label, and instance identity, supporting tasks such as 3D object detection, 3D multi-object tracking (MOT), and 6-DoF pose estimation. Besides, we establish the LAA3D Benchmark, integrating multiple tasks and methods with unified evaluation protocols for comparison. Furthermore, we propose MonoLAA, a monocular 3D detection baseline, achieving robust 3D localization from zoom cameras with varying focal lengths. Models pretrained on synthetic images transfer effectively to real-world data with fine-tuning, demonstrating strong sim-to-real generalization. Our LAA3D provides a comprehensive foundation for future research in low-altitude 3D object perception.

Paper Structure

This paper contains 33 sections, 6 equations, 22 figures, 24 tables.

Figures (22)

  • Figure 2: Comparison between LAA3D and previous UAV dataset. (a) The previous UAV Mmaud dataset only contains simple backgrounds and MAV objects. Each scene includes a few objects with limited annotations. (b) LAA3D contains diverse scenes and various objects, including MAV, eVTOL, and Helicopter. Each scene includes multiple objects with 2D/3D bounding boxes.
  • Figure 3: Representative samples from LAA3D-real, covering diverse real-world backgrounds.
  • Figure 4: Statistics of LAA3D-real: (a) fine classes distribution, (b) distance distribution, (c) coarse category distribution, (d) orientation angle distribution, and (e) object length distribution.
  • Figure 5: Examples of LAA model set. (a) MAV models consist of consumer drones (e.g, DJI Mavic) and industrial drones (e.g., delivery drone and agricultural drone). (b) eVTOL models include quadrotors (e.g., EHang 216) and hybrid wing aircraft (e.g., EHang VT-30). (c) Helicopter models include consumer aircraft such as H135, R22, etc.
  • Figure 6: Representative samples from LAA3D-sim covering diverse simulated backgrounds.
  • ...and 17 more figures