Towards Real-World Aerial Vision Guidance with Categorical 6D Pose Tracker
Jingtao Sun, Yaonan Wang, Danwei Wang
TL;DR
The paper tackles real-world aerial category-level 6-DoF pose tracking for guiding aerial manipulation. It introduces Robust6DoF, a three-stage tracker that fuses 2D-3D features with a Shape-Based Spatial-Temporal Augmentation and a Prior-Guided Keypoints Generation module, enabling robust inter-frame correspondence under severe viewpoint changes. Complementing this, PAD-Servo provides a pose-aware, decoupled control policy to drive both the onboard manipulator and the UAV, driven by the tracked pose. Extensive experiments on four public datasets plus real-world aerial tests demonstrate state-of-the-art accuracy, robustness to frame drops and noise, and real-time performance suitable for real-world aerial robotics guidance. The work offers a practical, integrated solution for category-level pose tracking and robotic vision guidance in high-maneuverability aerial contexts, with strong implications for autonomous manipulation tasks.
Abstract
Tracking the object 6-DoF pose is crucial for various downstream robot tasks and real-world applications. In this paper, we investigate the real-world robot task of aerial vision guidance for aerial robotics manipulation, utilizing category-level 6-DoF pose tracking. Aerial conditions inevitably introduce special challenges, such as rapid viewpoint changes in pitch and roll and inter-frame differences. To support these challenges in task, we firstly introduce a robust category-level 6-DoF pose tracker (Robust6DoF). This tracker leverages shape and temporal prior knowledge to explore optimal inter-frame keypoint pairs, generated under a priori structural adaptive supervision in a coarse-to-fine manner. Notably, our Robust6DoF employs a Spatial-Temporal Augmentation module to deal with the problems of the inter-frame differences and intra-class shape variations through both temporal dynamic filtering and shape-similarity filtering. We further present a Pose-Aware Discrete Servo strategy (PAD-Servo), serving as a decoupling approach to implement the final aerial vision guidance task. It contains two servo action policies to better accommodate the structural properties of aerial robotics manipulation. Exhaustive experiments on four well-known public benchmarks demonstrate the superiority of our Robust6DoF. Real-world tests directly verify that our Robust6DoF along with PAD-Servo can be readily used in real-world aerial robotic applications.
