RHAML: Rendezvous-based Hierarchical Architecture for Mutual Localization
Gaoming Chen, Kun Song, Xiang Xu, Wenhang Liu, Zhenhua Xiong
TL;DR
RHAML tackles RGB-based mutual localization for marker-less multi-robot teams by integrating a rendezvous-based hierarchical pipeline: (i) AMLNet with anisotropic convolutions provides fast initial localization, (ii) DeepIM-based iterative refinement sharpens pose estimates via rendering, and (iii) pose graph optimization integrates multi-frame observations and odometry to yield globally consistent poses. The approach delivers translation errors below $2$ cm and rotation errors below $0.5^ ext{deg}$ under depth variation up to $5$ m, and demonstrates practical utility in multi-robot map fusion. Key innovations include the Anisotropic Convolutional Network (ACN) within AMLNet, iterative rendering-based refinement, and a robust pose-graph framework with outlier rejection. This hierarchical, configurable architecture offers a flexible trade-off between speed and accuracy, enabling high-precision relative localization for collaboration and exploration in GPS-denied environments.
Abstract
Mutual localization serves as the foundation for collaborative perception and task assignment in multi-robot systems. Effectively utilizing limited onboard sensors for mutual localization between marker-less robots is a worthwhile goal. However, due to inadequate consideration of large scale variations of the observed robot and localization refinement, previous work has shown limited accuracy when robots are equipped only with RGB cameras. To enhance the precision of localization, this paper proposes a novel rendezvous-based hierarchical architecture for mutual localization (RHAML). Firstly, to learn multi-scale robot features, anisotropic convolutions are introduced into the network, yielding initial localization results. Then, the iterative refinement module with rendering is employed to adjust the observed robot poses. Finally, the pose graph is conducted to globally optimize all localization results, which takes into account multi-frame observations. Therefore, a flexible architecture is provided that allows for the selection of appropriate modules based on requirements. Simulations demonstrate that RHAML effectively addresses the problem of multi-robot mutual localization, achieving translation errors below 2 cm and rotation errors below 0.5 degrees when robots exhibit 5 m of depth variation. Moreover, its practical utility is validated by applying it to map fusion when multi-robots explore unknown environments.
