Active Visual Localization for Multi-Agent Collaboration: A Data-Driven Approach
Matthew Hanlon, Boyang Sun, Marc Pollefeys, Hermann Blum
TL;DR
The paper tackles cross-device visual localization by enabling a robot to actively select viewpoints that maximize localization accuracy within a pre-existing map built from different sensing devices. It introduces a data-driven viewpoint scoring framework with two lightweight models, a MLP and a Transformer-based Viewpoint Transformer (VPT), trained via a sample-and-evaluate pipeline on SfM landmark features and DINO appearances, within a map $\mathcal{M}=(\mathcal{M}_{l},\mathcal{M}_{t})$ and prior pose $\hat{\bm{p}}$. Comprehensive experiments in simulated HM3D-based indoor scenes and real-world deployments demonstrate that the data-driven VPT approach outperforms Fisher-information-based and heuristic baselines, particularly when occlusion filtering is included, and generalizes well to real-world data. The work advances practical cross-agent localization by delivering real-time viewpoint selection (under 1s for 100 candidates on a high-end GPU) and validating a scalable framework for multi-agent and human-robot collaboration in GPS-denied environments.
Abstract
Rather than having each newly deployed robot create its own map of its surroundings, the growing availability of SLAM-enabled devices provides the option of simply localizing in a map of another robot or device. In cases such as multi-robot or human-robot collaboration, localizing all agents in the same map is even necessary. However, localizing e.g. a ground robot in the map of a drone or head-mounted MR headset presents unique challenges due to viewpoint changes. This work investigates how active visual localization can be used to overcome such challenges of viewpoint changes. Specifically, we focus on the problem of selecting the optimal viewpoint at a given location. We compare existing approaches in the literature with additional proposed baselines and propose a novel data-driven approach. The result demonstrates the superior performance of the data-driven approach when compared to existing methods, both in controlled simulation experiments and real-world deployment.
