Table of Contents
Fetching ...

Markerless Robot Detection and 6D Pose Estimation for Multi-Agent SLAM

Markus Rueggeberg, Maximilian Ulmer, Maximilian Durner, Wout Boerdijk, Marcus Gerhard Mueller, Rudolph Triebel, Riccardo Giubilato

TL;DR

The paper tackles data association and loop-closure challenges in multi-robot SLAM, which are exacerbated by appearance changes and lighting when using fiducial markers. It introduces a markerless 6D pose estimation pipeline that leverages known robot shapes and transformer-based regression to detect and estimate inter-robot poses, integrated into a decentralized SLAM framework and trained on synthetic data. The authors demonstrate notable gains in detection range and instantaneous localization, validated through synthetic and real-world experiments, including Mt. Etna planetary-analog campaigns, and show improved SLAM performance over marker-based baselines. This approach enables robust mutual localization without markers, broadening deployment in harsh lighting and outdoor environments and paving the way for future work on articulated robot configurations and embedded GPU deployment.

Abstract

The capability of multi-robot SLAM approaches to merge localization history and maps from different observers is often challenged by the difficulty in establishing data association. Loop closure detection between perceptual inputs of different robotic agents is easily compromised in the context of perceptual aliasing, or when perspectives differ significantly. For this reason, direct mutual observation among robots is a powerful way to connect partial SLAM graphs, but often relies on the presence of calibrated arrays of fiducial markers (e.g., AprilTag arrays), which severely limits the range of observations and frequently fails under sharp lighting conditions, e.g., reflections or overexposure. In this work, we propose a novel solution to this problem leveraging recent advances in Deep-Learning-based 6D pose estimation. We feature markerless pose estimation as part of a decentralized multi-robot SLAM system and demonstrate the benefit to the relative localization accuracy among the robotic team. The solution is validated experimentally on data recorded in a test field campaign on a planetary analogous environment.

Markerless Robot Detection and 6D Pose Estimation for Multi-Agent SLAM

TL;DR

The paper tackles data association and loop-closure challenges in multi-robot SLAM, which are exacerbated by appearance changes and lighting when using fiducial markers. It introduces a markerless 6D pose estimation pipeline that leverages known robot shapes and transformer-based regression to detect and estimate inter-robot poses, integrated into a decentralized SLAM framework and trained on synthetic data. The authors demonstrate notable gains in detection range and instantaneous localization, validated through synthetic and real-world experiments, including Mt. Etna planetary-analog campaigns, and show improved SLAM performance over marker-based baselines. This approach enables robust mutual localization without markers, broadening deployment in harsh lighting and outdoor environments and paving the way for future work on articulated robot configurations and embedded GPU deployment.

Abstract

The capability of multi-robot SLAM approaches to merge localization history and maps from different observers is often challenged by the difficulty in establishing data association. Loop closure detection between perceptual inputs of different robotic agents is easily compromised in the context of perceptual aliasing, or when perspectives differ significantly. For this reason, direct mutual observation among robots is a powerful way to connect partial SLAM graphs, but often relies on the presence of calibrated arrays of fiducial markers (e.g., AprilTag arrays), which severely limits the range of observations and frequently fails under sharp lighting conditions, e.g., reflections or overexposure. In this work, we propose a novel solution to this problem leveraging recent advances in Deep-Learning-based 6D pose estimation. We feature markerless pose estimation as part of a decentralized multi-robot SLAM system and demonstrate the benefit to the relative localization accuracy among the robotic team. The solution is validated experimentally on data recorded in a test field campaign on a planetary analogous environment.
Paper Structure (15 sections, 8 figures, 3 tables)

This paper contains 15 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 1: (top:) Multi-robot detection of the LRU (Lightweight Rover Unit), and the Lander unit, from the perspective of the 2nd LRU unit during the [anonymous] field test campaign on Mount. Etna, Sicily. The figure shows the projection of the robot shapes, known a-priori, to demonstrate the quality of pose estimation. The distance of the LRU rover to the observer, the LRU2, as well as intense light reflections, would make it impossible through conventional fiducial markers to establish a robot detection. (bottom:) Members of the multi-robot team on Mt. Etna: LRU, LRU2 and the Lander unit.
  • Figure 2: Schematic overview of the employed decentralized SLAM system, with a focus on the multi-robot detection module capabilities. Each robot utilizes visual and inertial inputs to compute state estimation and partition robot states into submaps. Visual inputs are used to compute visual keyframes for place recognition, as well as detecting robots in the image, either through legacy AprilTag detection, or the proposed markerless approach. Results of submapping and robot detections from each robot in the team are embedded into a SLAM graph
  • Figure 3: Impressions of training samples. (left:) OAISYS images including LRU and Lander in an Etna-like setting. (right:) BlenderProc images, with LRU, Lander and random AprilTags for distractions, on a random background.
  • Figure 4: Illustration of the markerless detection pipeline. A 2D object detection first generates an image crop, which is then processed by the pose estimator. The encoder-decoder architecture produces three outputs: 2D–3D correspondences, foreground-background masks, and surface regions as an auxiliary task. The 2D–3D correspondences are subsequently passed to a pose regression network to predict the object’s 6D pose.
  • Figure 5: Examples of heavy augmentations on LRU instances from the synthetic samples used for an effective training of the pose estimation network in Sec. \ref{['sec:pose_est']}
  • ...and 3 more figures