Table of Contents
Fetching ...

AI-Based Multi-Object Relative State Estimation with Self-Calibration Capabilities

Thomas Jantos, Christian Brommer, Eren Allak, Stephan Weiss, Jan Steinbrener

TL;DR

This paper proposes a method combining a state-of-the-art AI-based pose estimator for objects in camera images with data from an inertial measurement unit (IMU) for 6-DoF multi-object relative state estimation of a mobile robot.

Abstract

The capability to extract task specific, semantic information from raw sensory data is a crucial requirement for many applications of mobile robotics. Autonomous inspection of critical infrastructure with Unmanned Aerial Vehicles (UAVs), for example, requires precise navigation relative to the structure that is to be inspected. Recently, Artificial Intelligence (AI)-based methods have been shown to excel at extracting semantic information such as 6 degree-of-freedom (6-DoF) poses of objects from images. In this paper, we propose a method combining a state-of-the-art AI-based pose estimator for objects in camera images with data from an inertial measurement unit (IMU) for 6-DoF multi-object relative state estimation of a mobile robot. The AI-based pose estimator detects multiple objects of interest in camera images along with their relative poses. These measurements are fused with IMU data in a state-of-the-art sensor fusion framework. We illustrate the feasibility of our proposed method with real world experiments for different trajectories and number of arbitrarily placed objects. We show that the results can be reliably reproduced due to the self-calibrating capabilities of our approach.

AI-Based Multi-Object Relative State Estimation with Self-Calibration Capabilities

TL;DR

This paper proposes a method combining a state-of-the-art AI-based pose estimator for objects in camera images with data from an inertial measurement unit (IMU) for 6-DoF multi-object relative state estimation of a mobile robot.

Abstract

The capability to extract task specific, semantic information from raw sensory data is a crucial requirement for many applications of mobile robotics. Autonomous inspection of critical infrastructure with Unmanned Aerial Vehicles (UAVs), for example, requires precise navigation relative to the structure that is to be inspected. Recently, Artificial Intelligence (AI)-based methods have been shown to excel at extracting semantic information such as 6 degree-of-freedom (6-DoF) poses of objects from images. In this paper, we propose a method combining a state-of-the-art AI-based pose estimator for objects in camera images with data from an inertial measurement unit (IMU) for 6-DoF multi-object relative state estimation of a mobile robot. The AI-based pose estimator detects multiple objects of interest in camera images along with their relative poses. These measurements are fused with IMU data in a state-of-the-art sensor fusion framework. We illustrate the feasibility of our proposed method with real world experiments for different trajectories and number of arbitrarily placed objects. We show that the results can be reliably reproduced due to the self-calibrating capabilities of our approach.
Paper Structure (8 sections, 8 equations, 4 figures, 1 table)

This paper contains 8 sections, 8 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Visualization of the coordinate frames in this work. We estimate the state of a fixed rigid body consisting of IMU $I$ and camera $C$ relative to up to $N$ different objects $O_k$ with respect to a fixed but arbitrary navigation world $W$. In addition to the core states (red), we also estimate the calibration between IMU and camera (blue). We also estimate the pose of the object frames with respect to the world (blue). Our pose sensor consists of AI-based 6-DoF relative pose measurements between camera and objects (green).
  • Figure 2: Object configuration that was used for sequence 4 (left) and object poses as estimated by PoET (right). Note the difference in object package coloring between real-world objects (left) and YCB-V objects used for training PoET (right). The left image, as shown here, is directly fed into our pose estimation framework to get the 6-DoF relative pose measurements.
  • Figure 3: Comparison of estimated position and orientation in Euler angles (mars) and the ground (gt) for run 8 of sequence 4. The components of the position (x, y, z) and orientation (roll, pitch, yaw) are plotted individually for the whole sequence. Additionally, we compare the reprojected IMU pose given the raw PoET estimates for object 3 (obj). The black arrows enclose a section in which the reprojected IMU pose is out of plotting range. Important to note, the object was not visible in the camera images between 6.4s and 8.8s.
  • Figure 4: Visualization of the estimated object-world state (${\mathbf{p}}_{ O_kW}, {\mathbf{q}}_{ O_kW}$) and the corresponding state covariance represented by the std for a non-main object for run 8 of sequence 4. The position is split up into (x, y, z), while the orientation is represented by the Euler angles. The states are plotted from the point of time the object is first observed (at about 10s) until the states converge. At the beginning the object state is wrongly initialized due to perhaps a noisy measurement. However, after about 5 seconds the state converges and the uncertainty becomes minimal.