Table of Contents
Fetching ...

Enabling High-Frequency Cross-Modality Visual Positioning Service for Accurate Drone Landing

Haoyang Wang, Xinyu Luo, Wenhua Ding, Jingao Xu, Xuecheng Chen, Ruiyang Duan, Jialong Chen, Haitao Zhang, Yunhao Liu, Xinlei Chen

TL;DR

Urban drone landing requires reliable, real-time $6$-DoF pose estimation in GPS-denied environments. This work redesigns the Visual Positioning Service for drones by replacing frame-based sensing with an event camera and introducing STPE to produce a Temporal Distance Field for 2D event-3D map matching, complemented by MHFO to fuse motion and map data in early event filtering and later pose optimization. The system achieves ms-level latency and sub-centimeter translation accuracy, improving ARE by over 50% compared with state-of-the-art baselines across indoor and outdoor scenarios. EV-Pose integrates with common flight controllers and demonstrates robustness to motion blur, HDR lighting, and dynamic scenes, enabling accurate, high-frequency drone landings in urban canyons. These findings suggest that cross-modality VPS with event sensing can substantially enhance autonomous drone landing reliability in real-world conditions.

Abstract

After years of growth, drone-based delivery is transforming logistics. At its core, real-time 6-DoF drone pose tracking enables precise flight control and accurate drone landing. With the widespread availability of urban 3D maps, the Visual Positioning Service (VPS), a mobile pose estimation system, has been adapted to enhance drone pose tracking during the landing phase, as conventional systems like GPS are unreliable in urban environments due to signal attenuation and multi-path propagation. However, deploying the current VPS on drones faces limitations in both estimation accuracy and efficiency. In this work, we redesign drone-oriented VPS with the event camera and introduce EV-Pose to enable accurate, high-frequency 6-DoF pose tracking for accurate drone landing. EV-Pose introduces a spatio-temporal feature-instructed pose estimation module that extracts a temporal distance field to enable 3D point map matching for pose estimation; and a motion-aware hierarchical fusion and optimization scheme to enhance the above estimation in accuracy and efficiency, by utilizing drone motion in the \textit{early stage} of event filtering and the \textit{later stage} of pose optimization. Evaluation shows that EV-Pose achieves a rotation accuracy of 1.34$\degree$ and a translation accuracy of 6.9$mm$ with a tracking latency of 10.08$ms$, outperforming baselines by $>$50\%, \tmcrevise{thus enabling accurate drone landings.} Demo: https://ev-pose.github.io/

Enabling High-Frequency Cross-Modality Visual Positioning Service for Accurate Drone Landing

TL;DR

Urban drone landing requires reliable, real-time -DoF pose estimation in GPS-denied environments. This work redesigns the Visual Positioning Service for drones by replacing frame-based sensing with an event camera and introducing STPE to produce a Temporal Distance Field for 2D event-3D map matching, complemented by MHFO to fuse motion and map data in early event filtering and later pose optimization. The system achieves ms-level latency and sub-centimeter translation accuracy, improving ARE by over 50% compared with state-of-the-art baselines across indoor and outdoor scenarios. EV-Pose integrates with common flight controllers and demonstrates robustness to motion blur, HDR lighting, and dynamic scenes, enabling accurate, high-frequency drone landings in urban canyons. These findings suggest that cross-modality VPS with event sensing can substantially enhance autonomous drone landing reliability in real-world conditions.

Abstract

After years of growth, drone-based delivery is transforming logistics. At its core, real-time 6-DoF drone pose tracking enables precise flight control and accurate drone landing. With the widespread availability of urban 3D maps, the Visual Positioning Service (VPS), a mobile pose estimation system, has been adapted to enhance drone pose tracking during the landing phase, as conventional systems like GPS are unreliable in urban environments due to signal attenuation and multi-path propagation. However, deploying the current VPS on drones faces limitations in both estimation accuracy and efficiency. In this work, we redesign drone-oriented VPS with the event camera and introduce EV-Pose to enable accurate, high-frequency 6-DoF pose tracking for accurate drone landing. EV-Pose introduces a spatio-temporal feature-instructed pose estimation module that extracts a temporal distance field to enable 3D point map matching for pose estimation; and a motion-aware hierarchical fusion and optimization scheme to enhance the above estimation in accuracy and efficiency, by utilizing drone motion in the \textit{early stage} of event filtering and the \textit{later stage} of pose optimization. Evaluation shows that EV-Pose achieves a rotation accuracy of 1.34 and a translation accuracy of 6.9 with a tracking latency of 10.08, outperforming baselines by 50\%, \tmcrevise{thus enabling accurate drone landings.} Demo: https://ev-pose.github.io/

Paper Structure

This paper contains 27 sections, 6 equations, 23 figures, 1 table, 1 algorithm.

Figures (23)

  • Figure 1: EV-Pose estimates the drone's 6-DoF pose by redesigning the drone-oriented VPS with event cameras. As shown in (a), compared to conventional VPS systems, EV-Pose enables rapid and high-frequency drone pose tracking, ensuring precise flight control and landing as shown in (b).
  • Figure 2: Motivation study. (a) RGB image. (b) Noisy event stream. (c) In left-to-right motion, event features extracted by Arc*, an event-based feature extraction algorithm alzugaray2022event. (d) Arc* features extracted during right-to-left motion.
  • Figure 3: Current VPS and comparison of frame camera-based VPS & event camera-enhanced VPS. (a) Landing of drone. (b) Self-collection airport and landing platform. (c) Current VPS uses an RGB camera, an IMU, and 3D point clouds for pose estimation. (d) EV-Pose leverages event cameras for accurate and low-latency 6-DoF drone pose tracking.
  • Figure 4: Principles of frame cameras and event cameras. (a) Frame camera uses a global shutter to capture synchronous frames. Each pixel of the event camera operates independently, generating events asynchronously. (b) Each pixel in event cameras generates events when intensity changes exceed a threshold: [ON] for increases, [OFF] for decreases.
  • Figure 5: System architecture of EV-Pose.
  • ...and 18 more figures