Multi-cam Multi-map Visual Inertial Localization: System, Validation and Dataset
Yufei Wei, Fuzhang Han, Yanmei Jiao, Zhuqing Zhang, Yiyuan Pan, Wenjun Huang, Li Tang, Huan Yin, Xiaqing Ding, Chenxiao Hu, Rong Xiong, Yue Wang
TL;DR
The paper tackles the challenge of providing real-time, causal pose estimation for robot control in large, changing environments. It introduces a multi-camera, multi-map Visual-Inertial Localization (VILO) system that fuse-map observations in a filter-based framework, augments the state to handle multiple isolated maps, and uses IMU-aided 2-point solvers for robust, fast initialization and matching. Key contributions include causal, bounded-error localization across disconnected maps, a two-stage mapping process with scale-aware multi-sensor integration, a Schmidt-EKF based map-feature update, and a comprehensive evaluation framework plus a long-term campus dataset. The approach yields improved real-time localization accuracy, robustness to appearance changes, and practical guidelines for deploying multi-camera, multi-map localization in field robotics, with open-source availability of both the system and the dataset.
Abstract
Robot control loops require causal pose estimates that depend only on past and present measurements. At each timestep, controllers compute commands using the current pose without waiting for future refinements. While traditional visual SLAM systems achieve high accuracy through retrospective loop closures, these corrections arrive after control decisions were already executed, violating causality. Visual-inertial odometry maintains causality but accumulates unbounded drift over time. To address the distinct requirements of robot control, we propose a multi-camera multi-map visual-inertial localization system providing real-time, causal pose estimation with bounded localization error through continuous map constraints. Since standard trajectory metrics evaluate post-processed trajectories, we analyze the error composition of map-based localization systems and propose a set of evaluation metrics suitable for measuring causal localization performance. To validate our system, we design a multi-camera IMU hardware setup and collect a challenging long-term campus dataset featuring diverse illumination and seasonal conditions. Experimental results on public benchmarks and on our own collected dataset demonstrate that our system provides significantly higher real-time localization accuracy compared to other methods. To benefit the community, we have made both the system and the dataset open source at https://anonymous.4open.science/r/Multi-cam-Multi-map-VILO-7993.
