Table of Contents
Fetching ...

PanoAir: A Panoramic Visual-Inertial SLAM with Cross-Time Real-World UAV Dataset

Yiyang Wu, Xiaohu Zhang, Yanjin Du, Tongsu Zhang, Chujun Li, Siyang Chen, Guoyi Zhang, Xiangpeng Xu

Abstract

Accurate pose estimation is fundamental for unmanned aerial vehicle (UAV) applications, where Visual-Inertial SLAM (VI-SLAM) provides a cost-effective solution for localization and mapping. However, existing VI-SLAM methods mainly rely on sensors with limited fields of view (FoV), which can lead to drift and even failure in complex UAV scenarios. Although panoramic cameras provide omnidirectional perception to improve robustness, panoramic VI-SLAM and corresponding real-world datasets for UAVs remain underexplored. To address this limitation, we first construct a real-world panoramic visual-inertial dataset covering diverse flight conditions, including varying illumination, altitudes, trajectory lengths, and motion dynamics. To achieve accurate and robust pose estimation under such challenging UAV scenarios, we propose a panoramic VI-SLAM framework that exploits the omnidirectional FoV via the proposed panoramic feature extraction and panoramic loop closure, enhancing feature constraints and ensuring global consistency. Extensive experiments on both the proposed dataset and public benchmarks demonstrate that our method achieves superior accuracy, robustness, and consistency compared to existing approaches. Moreover, deployment on embedded platform validates its practical applicability, achieving comparable computational efficiency to PC implementations. The source code and dataset are publicly available at https://drive.google.com/file/d/1lG1Upn6yi-N6tYpEHAt6dfR1uhzNtWbT/view

PanoAir: A Panoramic Visual-Inertial SLAM with Cross-Time Real-World UAV Dataset

Abstract

Accurate pose estimation is fundamental for unmanned aerial vehicle (UAV) applications, where Visual-Inertial SLAM (VI-SLAM) provides a cost-effective solution for localization and mapping. However, existing VI-SLAM methods mainly rely on sensors with limited fields of view (FoV), which can lead to drift and even failure in complex UAV scenarios. Although panoramic cameras provide omnidirectional perception to improve robustness, panoramic VI-SLAM and corresponding real-world datasets for UAVs remain underexplored. To address this limitation, we first construct a real-world panoramic visual-inertial dataset covering diverse flight conditions, including varying illumination, altitudes, trajectory lengths, and motion dynamics. To achieve accurate and robust pose estimation under such challenging UAV scenarios, we propose a panoramic VI-SLAM framework that exploits the omnidirectional FoV via the proposed panoramic feature extraction and panoramic loop closure, enhancing feature constraints and ensuring global consistency. Extensive experiments on both the proposed dataset and public benchmarks demonstrate that our method achieves superior accuracy, robustness, and consistency compared to existing approaches. Moreover, deployment on embedded platform validates its practical applicability, achieving comparable computational efficiency to PC implementations. The source code and dataset are publicly available at https://drive.google.com/file/d/1lG1Upn6yi-N6tYpEHAt6dfR1uhzNtWbT/view

Paper Structure

This paper contains 16 sections, 8 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Overview of the proposed dataset. A panoramic camera is mounted on a UAV to collect panoramic-IMU data, with ground-truth trajectories recorded using an RTK ground station (a). The dataset covers diverse lighting conditions, altitudes, flight speeds, and maneuvering characteristics, as summarized in (b), with representative scenes shown in (c), details in Section\ref{['sec:overview_dataset']}.
  • Figure 2: Overview of the proposed panoramic visual-inertial SLAM framework. The system takes panoramic images and IMU measurements as inputs, and achieves robust, accurate, and globally consistent results through panoramic feature extraction (Section \ref{['sec:pano_feature_extraction']}), as well as panoramic optimization and panoramic loop closure (Section \ref{['sec:optimization']}). Experimental results are presented in Section \ref{['chap:exp']}.
  • Figure 3: Hybrid feature tracking on ERP images. Red dots denote learning-based features (e.g., SuperPoint), green dots denote hand-crafted features (e.g., ORB).
  • Figure 4: Qualitative comparison results. Top: comparison of different methods on the proposed dataset, where our method produces estimates closer to the ground truth and demonstrates stronger robustness. Bottom: results on an additional large-scale handheld sequence. Other methods suffer from varying degrees of drift, while our method achieves lower drift and, by leveraging panoramic loop closure, successfully aligns with the starting position, resulting in a globally consistent estimation.
  • Figure 5: Box plots of ATE on the PC (30 Hz) and the Orin NX (10 Hz) in five runs.