Table of Contents
Fetching ...

RHAML: Rendezvous-based Hierarchical Architecture for Mutual Localization

Gaoming Chen, Kun Song, Xiang Xu, Wenhang Liu, Zhenhua Xiong

TL;DR

RHAML tackles RGB-based mutual localization for marker-less multi-robot teams by integrating a rendezvous-based hierarchical pipeline: (i) AMLNet with anisotropic convolutions provides fast initial localization, (ii) DeepIM-based iterative refinement sharpens pose estimates via rendering, and (iii) pose graph optimization integrates multi-frame observations and odometry to yield globally consistent poses. The approach delivers translation errors below $2$ cm and rotation errors below $0.5^ ext{deg}$ under depth variation up to $5$ m, and demonstrates practical utility in multi-robot map fusion. Key innovations include the Anisotropic Convolutional Network (ACN) within AMLNet, iterative rendering-based refinement, and a robust pose-graph framework with outlier rejection. This hierarchical, configurable architecture offers a flexible trade-off between speed and accuracy, enabling high-precision relative localization for collaboration and exploration in GPS-denied environments.

Abstract

Mutual localization serves as the foundation for collaborative perception and task assignment in multi-robot systems. Effectively utilizing limited onboard sensors for mutual localization between marker-less robots is a worthwhile goal. However, due to inadequate consideration of large scale variations of the observed robot and localization refinement, previous work has shown limited accuracy when robots are equipped only with RGB cameras. To enhance the precision of localization, this paper proposes a novel rendezvous-based hierarchical architecture for mutual localization (RHAML). Firstly, to learn multi-scale robot features, anisotropic convolutions are introduced into the network, yielding initial localization results. Then, the iterative refinement module with rendering is employed to adjust the observed robot poses. Finally, the pose graph is conducted to globally optimize all localization results, which takes into account multi-frame observations. Therefore, a flexible architecture is provided that allows for the selection of appropriate modules based on requirements. Simulations demonstrate that RHAML effectively addresses the problem of multi-robot mutual localization, achieving translation errors below 2 cm and rotation errors below 0.5 degrees when robots exhibit 5 m of depth variation. Moreover, its practical utility is validated by applying it to map fusion when multi-robots explore unknown environments.

RHAML: Rendezvous-based Hierarchical Architecture for Mutual Localization

TL;DR

RHAML tackles RGB-based mutual localization for marker-less multi-robot teams by integrating a rendezvous-based hierarchical pipeline: (i) AMLNet with anisotropic convolutions provides fast initial localization, (ii) DeepIM-based iterative refinement sharpens pose estimates via rendering, and (iii) pose graph optimization integrates multi-frame observations and odometry to yield globally consistent poses. The approach delivers translation errors below cm and rotation errors below under depth variation up to m, and demonstrates practical utility in multi-robot map fusion. Key innovations include the Anisotropic Convolutional Network (ACN) within AMLNet, iterative rendering-based refinement, and a robust pose-graph framework with outlier rejection. This hierarchical, configurable architecture offers a flexible trade-off between speed and accuracy, enabling high-precision relative localization for collaboration and exploration in GPS-denied environments.

Abstract

Mutual localization serves as the foundation for collaborative perception and task assignment in multi-robot systems. Effectively utilizing limited onboard sensors for mutual localization between marker-less robots is a worthwhile goal. However, due to inadequate consideration of large scale variations of the observed robot and localization refinement, previous work has shown limited accuracy when robots are equipped only with RGB cameras. To enhance the precision of localization, this paper proposes a novel rendezvous-based hierarchical architecture for mutual localization (RHAML). Firstly, to learn multi-scale robot features, anisotropic convolutions are introduced into the network, yielding initial localization results. Then, the iterative refinement module with rendering is employed to adjust the observed robot poses. Finally, the pose graph is conducted to globally optimize all localization results, which takes into account multi-frame observations. Therefore, a flexible architecture is provided that allows for the selection of appropriate modules based on requirements. Simulations demonstrate that RHAML effectively addresses the problem of multi-robot mutual localization, achieving translation errors below 2 cm and rotation errors below 0.5 degrees when robots exhibit 5 m of depth variation. Moreover, its practical utility is validated by applying it to map fusion when multi-robots explore unknown environments.
Paper Structure (20 sections, 9 equations, 8 figures, 3 tables, 2 algorithms)

This paper contains 20 sections, 9 equations, 8 figures, 3 tables, 2 algorithms.

Figures (8)

  • Figure 1: Overview of the proposed architecture for multi-robot mutual localization. Team robots send their respective RGB images and wheel odometry to the central node. Firstly, the Initial Mutual Localization module inputs the images captured by robots into our AMLNet to detect whether an observed robot is in the FOV, and outputs the initial localization result ${P}_{init,i}$ to the Iterative Refinement module. Then, by iteratively rendering the 3D model, the refined localization ${T}_{ref,i}$ is obtained through DeepIM. After that, the selected localization results are optimized by constructing the pose graph.
  • Figure 2: Illustration of ACN and AIncep block. (a) ACN: Each dimension of the feature map is calculated by multiple convolutional kernels of different sizes, and their weights are learned through training. (b) The structure of the AIncep block, including the ACN module.
  • Figure 3: The architecture of AMLNet primarily consists of downsamplings, AIncep blocks, and fully connected layers. It is designed to extract features from the images captured by the observer robot and predict ${o}_{i}$ and ${P}_{init, i}$.
  • Figure 4: The architecture of the iterative refinement module. Based on the initial mutual localization result and the CAD model of the robot, the image is rendered. Combined with the observed image, they are input into the network to output the relative transformation. Through an iterative loop, refined localization is finally obtained.
  • Figure 5: The coordinates of two robots during the rendezvous period. The yellow arrow indicates that Robot A was observed, totaling ${\alpha}$ times. Similarly, the purple arrow represents that Robot B was observed, amounting to ${\beta}$ times.
  • ...and 3 more figures