Table of Contents
Fetching ...

Multi-cam Multi-map Visual Inertial Localization: System, Validation and Dataset

Yufei Wei, Fuzhang Han, Yanmei Jiao, Zhuqing Zhang, Yiyuan Pan, Wenjun Huang, Li Tang, Huan Yin, Xiaqing Ding, Chenxiao Hu, Rong Xiong, Yue Wang

TL;DR

The paper tackles the challenge of providing real-time, causal pose estimation for robot control in large, changing environments. It introduces a multi-camera, multi-map Visual-Inertial Localization (VILO) system that fuse-map observations in a filter-based framework, augments the state to handle multiple isolated maps, and uses IMU-aided 2-point solvers for robust, fast initialization and matching. Key contributions include causal, bounded-error localization across disconnected maps, a two-stage mapping process with scale-aware multi-sensor integration, a Schmidt-EKF based map-feature update, and a comprehensive evaluation framework plus a long-term campus dataset. The approach yields improved real-time localization accuracy, robustness to appearance changes, and practical guidelines for deploying multi-camera, multi-map localization in field robotics, with open-source availability of both the system and the dataset.

Abstract

Robot control loops require causal pose estimates that depend only on past and present measurements. At each timestep, controllers compute commands using the current pose without waiting for future refinements. While traditional visual SLAM systems achieve high accuracy through retrospective loop closures, these corrections arrive after control decisions were already executed, violating causality. Visual-inertial odometry maintains causality but accumulates unbounded drift over time. To address the distinct requirements of robot control, we propose a multi-camera multi-map visual-inertial localization system providing real-time, causal pose estimation with bounded localization error through continuous map constraints. Since standard trajectory metrics evaluate post-processed trajectories, we analyze the error composition of map-based localization systems and propose a set of evaluation metrics suitable for measuring causal localization performance. To validate our system, we design a multi-camera IMU hardware setup and collect a challenging long-term campus dataset featuring diverse illumination and seasonal conditions. Experimental results on public benchmarks and on our own collected dataset demonstrate that our system provides significantly higher real-time localization accuracy compared to other methods. To benefit the community, we have made both the system and the dataset open source at https://anonymous.4open.science/r/Multi-cam-Multi-map-VILO-7993.

Multi-cam Multi-map Visual Inertial Localization: System, Validation and Dataset

TL;DR

The paper tackles the challenge of providing real-time, causal pose estimation for robot control in large, changing environments. It introduces a multi-camera, multi-map Visual-Inertial Localization (VILO) system that fuse-map observations in a filter-based framework, augments the state to handle multiple isolated maps, and uses IMU-aided 2-point solvers for robust, fast initialization and matching. Key contributions include causal, bounded-error localization across disconnected maps, a two-stage mapping process with scale-aware multi-sensor integration, a Schmidt-EKF based map-feature update, and a comprehensive evaluation framework plus a long-term campus dataset. The approach yields improved real-time localization accuracy, robustness to appearance changes, and practical guidelines for deploying multi-camera, multi-map localization in field robotics, with open-source availability of both the system and the dataset.

Abstract

Robot control loops require causal pose estimates that depend only on past and present measurements. At each timestep, controllers compute commands using the current pose without waiting for future refinements. While traditional visual SLAM systems achieve high accuracy through retrospective loop closures, these corrections arrive after control decisions were already executed, violating causality. Visual-inertial odometry maintains causality but accumulates unbounded drift over time. To address the distinct requirements of robot control, we propose a multi-camera multi-map visual-inertial localization system providing real-time, causal pose estimation with bounded localization error through continuous map constraints. Since standard trajectory metrics evaluate post-processed trajectories, we analyze the error composition of map-based localization systems and propose a set of evaluation metrics suitable for measuring causal localization performance. To validate our system, we design a multi-camera IMU hardware setup and collect a challenging long-term campus dataset featuring diverse illumination and seasonal conditions. Experimental results on public benchmarks and on our own collected dataset demonstrate that our system provides significantly higher real-time localization accuracy compared to other methods. To benefit the community, we have made both the system and the dataset open source at https://anonymous.4open.science/r/Multi-cam-Multi-map-VILO-7993.

Paper Structure

This paper contains 49 sections, 24 equations, 16 figures, 12 tables.

Figures (16)

  • Figure 1: Comparison of localization paradigms for robot control. (a) Traditional SLAM relies on loop closures for drift correction, causing discontinuities and non-causal updates that arrive after control decisions. (b) The proposed multi-map VILO maintains causal, continuous estimates by leveraging pre-built maps for real-time correction, eliminating control-loop incompatibilities.
  • Figure 2: Overview of the proposed VILO system. In mapping mode, the system receives real-time multi-sensor data inputs into the online mapping module for initial map construction, and provides mapping quality feedback to the user, allowing for timely adjustments to the data collection strategy. Then the offline mapping module performs two-stage high-precision map construction and supports dense reconstruction output. In localization mode, the system uses single or multiple isolated maps constructed in mapping mode for map-based robust, accurate and real-time state estimation. Specific details can be found in Sec. \ref{['sec:system']}. LCD: Loop closure detection, SFM: structure-from-motion, BA: bundle adjustment.
  • Figure 3: Illustration of each frame. There are two kinds of observations in the proposed system, the local observation (the blue shade part) and the map observation (the pink shade part).
  • Figure 4: Online mapping process.
  • Figure 5: Error decomposition for map-based robot navigation. Task: Navigate from start to goal in map frame $\{G^m\}$. Process: The system first aligns local frame $\{L\}$ with map frame through initialization, then matches current image $C_i$ with map image $C_j$ at each timestep $i$ to update pose $^{G^m}\hat{\mathbf{T}}_{C_i}$ for control. During the robot's navigation process, take timestamp $i$ as an example, the error analysis between $C_i$ and the retrieved map frame $C_j$ is highlighted in the bottom right corner.
  • ...and 11 more figures