Table of Contents
Fetching ...

AIVIO: Closed-loop, Object-relative Navigation of UAVs with AI-aided Visual Inertial Odometry

Thomas Jantos, Martin Scheiber, Christian Brommer, Eren Allak, Stephan Weiss, Jan Steinbrener

TL;DR

This letter presents a real-time capable unmanned aerial vehicle (UAV) system for object-relative, closed-loop navigation with a minimal sensor configuration consisting of an inertial measurement unit (IMU) and RGB camera.

Abstract

Object-relative mobile robot navigation is essential for a variety of tasks, e.g. autonomous critical infrastructure inspection, but requires the capability to extract semantic information about the objects of interest from raw sensory data. While deep learning-based (DL) methods excel at inferring semantic object information from images, such as class and relative 6 degree of freedom (6-DoF) pose, they are computationally demanding and thus often not suitable for payload constrained mobile robots. In this letter we present a real-time capable unmanned aerial vehicle (UAV) system for object-relative, closed-loop navigation with a minimal sensor configuration consisting of an inertial measurement unit (IMU) and RGB camera. Utilizing a DL-based object pose estimator, solely trained on synthetic data and optimized for companion board deployment, the object-relative pose measurements are fused with the IMU data to perform object-relative localization. We conduct multiple real-world experiments to validate the performance of our system for the challenging use case of power pole inspection. An example closed-loop flight is presented in the supplementary video.

AIVIO: Closed-loop, Object-relative Navigation of UAVs with AI-aided Visual Inertial Odometry

TL;DR

This letter presents a real-time capable unmanned aerial vehicle (UAV) system for object-relative, closed-loop navigation with a minimal sensor configuration consisting of an inertial measurement unit (IMU) and RGB camera.

Abstract

Object-relative mobile robot navigation is essential for a variety of tasks, e.g. autonomous critical infrastructure inspection, but requires the capability to extract semantic information about the objects of interest from raw sensory data. While deep learning-based (DL) methods excel at inferring semantic object information from images, such as class and relative 6 degree of freedom (6-DoF) pose, they are computationally demanding and thus often not suitable for payload constrained mobile robots. In this letter we present a real-time capable unmanned aerial vehicle (UAV) system for object-relative, closed-loop navigation with a minimal sensor configuration consisting of an inertial measurement unit (IMU) and RGB camera. Utilizing a DL-based object pose estimator, solely trained on synthetic data and optimized for companion board deployment, the object-relative pose measurements are fused with the IMU data to perform object-relative localization. We conduct multiple real-world experiments to validate the performance of our system for the challenging use case of power pole inspection. An example closed-loop flight is presented in the supplementary video.
Paper Structure (12 sections, 6 equations, 7 figures, 5 tables)

This paper contains 12 sections, 6 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Experimental setup of a mock-up power pole with three insulators in our research laboratory. Moreover, we visualize a trajectory in red representing an inspection flight. Bottom left: Our mobile robot platform of choice, a quadcopter equipped with a PX4, an RGB camera and an NVIDIA Jetson.
  • Figure 2: Schematic overview of the hardware and software components and their interaction. The CNS Flight Stack cns_flightstack22 is responsible for handling high-level autonomy and communication with the PX4. We utilize PoET jantos2023poet, a DL-based 6-DoF object pose estimator, to predict relative object pose measurements given an input image from the RGB navigation camera. These measurements are fused with IMU data in a state estimator for object-relative localization. Moreover, the autonomy and state estimator allow for switching between a global pose sensor and our object-relative landmark sensor and to calculate trajectory waypoints based on the current estimate.
  • Figure 3: Visualization of the homography mapping between two different camera parameters. Note, the original sizes of the images are 1280 x 960 and 640x480, respectively for the left and right image.
  • Figure 4: Reprojection of the 6-DoF object poses predicted by PoET (TRT Small FP16) for our real-world scenario (left) and a synthetic image (right). Note that the black border for the real image is due to the homography mapping between different camera parameters.
  • Figure 5: Schematic representation of our proposed object-relative navigation workflow. The objects of interest, e.g. infrastructure, are approached using global navigation. As soon as the objects of interest are detected, the UAV switches to object-relative navigation by estimating the states of the objects. While the global reference frame is discarded, one of the detected objects is fixed as a reference frame to render the problem observable. Once the object-relative navigation is done, the UAV switches back to global navigation and removes the object states from its estimation.
  • ...and 2 more figures