Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information
Luca Di Giammarino, Boyang Sun, Giorgio Grisetti, Marc Pollefeys, Hermann Blum, Daniel Barath
TL;DR
This work tackles active localization by learning where to look: it defines a scoring function $f_{\mathcal{P}}(\mathbf{R},\mathbf{t})$ that evaluates the localization quality of candidate viewpoints. A compact, real-time encoder operates on a geometry-driven map built from a voxel-location grid and spherical Fibonacci orientations, with a self-supervised training loop that labels viewpoints using COLMAP-based pose verification against simulator-generated data. The map supports multiple valid viewpoints per location and can be embedded into planning, enabling planners to choose viewpoints that balance path cost and localization accuracy; experiments show improvements over Fisher-information baselines and real-time inference (~$0.02$ s) on indoor-like scenes, with the approach generalizing from synthetic to real data and an open-source release for the community. The results underscore the importance of the image-landmark distribution and 3D geometric information for robust active localization in robotics applications.
Abstract
Accurate localization in diverse environments is a fundamental challenge in computer vision and robotics. The task involves determining a sensor's precise position and orientation, typically a camera, within a given space. Traditional localization methods often rely on passive sensing, which may struggle in scenarios with limited features or dynamic environments. In response, this paper explores the domain of active localization, emphasizing the importance of viewpoint selection to enhance localization accuracy. Our contributions involve using a data-driven approach with a simple architecture designed for real-time operation, a self-supervised data training method, and the capability to consistently integrate our map into a planning framework tailored for real-world robotics applications. Our results demonstrate that our method performs better than the existing one, targeting similar problems and generalizing on synthetic and real data. We also release an open-source implementation to benefit the community.
