DroneKey++: A Size Prior-free Method and New Benchmark for Drone 3D Pose Estimation from Sequential Images
Seo-Bin Hwang, Yeong-Jun Cho
TL;DR
The paper tackles the problem of 3D drone pose estimation without relying on size or mesh priors, addressing the scarcity and limited generalization of existing datasets. It introduces DroneKey++, a prior-free end-to-end framework that fuses a keypoint encoder (joint 2D keypoint detection and drone classification) with a ray-based 3D pose decoder, achieving accurate rotation and translation from monocular sequences. To support robust evaluation across diverse drones and environments, it also presents 6DroneSyn, a large synthetic benchmark with 52,920 images across 7 models and 88 outdoor backgrounds, generated via 360-degree panorama synthesis to reduce domain gap. Empirical results show that DroneKey++ delivers state-of-the-art rotation and translation accuracy (e.g., $\text{MAE}_R=17.34^{\circ}$, $\text{MAE}_t=0.135$ m) while maintaining real-time inference ($414.07$ FPS on GPU, $19.25$ FPS on CPU), demonstrating strong generalization and practicality for anti-drone and surveillance applications.
Abstract
Accurate 3D pose estimation of drones is essential for security and surveillance systems. However, existing methods often rely on prior drone information such as physical sizes or 3D meshes. At the same time, current datasets are small-scale, limited to single models, and collected under constrained environments, which makes reliable validation of generalization difficult. We present DroneKey++, a prior-free framework that jointly performs keypoint detection, drone classification, and 3D pose estimation. The framework employs a keypoint encoder for simultaneous keypoint detection and classification, and a pose decoder that estimates 3D pose using ray-based geometric reasoning and class embeddings. To address dataset limitations, we construct 6DroneSyn, a large-scale synthetic benchmark with over 50K images covering 7 drone models and 88 outdoor backgrounds, generated using 360-degree panoramic synthesis. Experiments show that DroneKey++ achieves MAE 17.34 deg and MedAE 17.1 deg for rotation, MAE 0.135 m and MedAE 0.242 m for translation, with inference speeds of 19.25 FPS (CPU) and 414.07 FPS (GPU), demonstrating both strong generalization across drone models and suitability for real-time applications. The dataset is publicly available.
