Table of Contents
Fetching ...

Drift-free Visual SLAM using Digital Twins

Roxane Merat, Giovanni Cioffi, Leonard Bauersfeld, Davide Scaramuzza

TL;DR

This work tackles drift in visual-inertial SLAM for urban operation by localizing the VIO/VSLAM-generated sparse 3D point cloud to a city digital twin using point-to-plane ICP, producing a global 6-DoF measurement integrated into the SLAM back-end. It introduces an adaptive weighting scheme to stabilize the map-registration residuals in degenerate scenes and provides an initial frame alignment that transitions to map-based alignment once convergence is achieved. The approach, implemented as SVO-Digital Twin, is validated in both a high-fidelity GPS simulation and real-world drone flights, showing superior drift reduction and robustness to viewpoint changes compared with state-of-the-art VIO-GPS and visual localization methods. The results demonstrate that leveraging digital-twin geometry for global localization can significantly enhance long-term pose reliability in urban environments, enabling more robust autonomous operation.

Abstract

Globally-consistent localization in urban environments is crucial for autonomous systems such as self-driving vehicles and drones, as well as assistive technologies for visually impaired people. Traditional Visual-Inertial Odometry (VIO) and Visual Simultaneous Localization and Mapping (VSLAM) methods, though adequate for local pose estimation, suffer from drift in the long term due to reliance on local sensor data. While GPS counteracts this drift, it is unavailable indoors and often unreliable in urban areas. An alternative is to localize the camera to an existing 3D map using visual-feature matching. This can provide centimeter-level accurate localization but is limited by the visual similarities between the current view and the map. This paper introduces a novel approach that achieves accurate and globally-consistent localization by aligning the sparse 3D point cloud generated by the VIO/VSLAM system to a digital twin using point-to-plane matching; no visual data association is needed. The proposed method provides a 6-DoF global measurement tightly integrated into the VIO/VSLAM system. Experiments run on a high-fidelity GPS simulator and real-world data collected from a drone demonstrate that our approach outperforms state-of-the-art VIO-GPS systems and offers superior robustness against viewpoint changes compared to the state-of-the-art Visual SLAM systems.

Drift-free Visual SLAM using Digital Twins

TL;DR

This work tackles drift in visual-inertial SLAM for urban operation by localizing the VIO/VSLAM-generated sparse 3D point cloud to a city digital twin using point-to-plane ICP, producing a global 6-DoF measurement integrated into the SLAM back-end. It introduces an adaptive weighting scheme to stabilize the map-registration residuals in degenerate scenes and provides an initial frame alignment that transitions to map-based alignment once convergence is achieved. The approach, implemented as SVO-Digital Twin, is validated in both a high-fidelity GPS simulation and real-world drone flights, showing superior drift reduction and robustness to viewpoint changes compared with state-of-the-art VIO-GPS and visual localization methods. The results demonstrate that leveraging digital-twin geometry for global localization can significantly enhance long-term pose reliability in urban environments, enabling more robust autonomous operation.

Abstract

Globally-consistent localization in urban environments is crucial for autonomous systems such as self-driving vehicles and drones, as well as assistive technologies for visually impaired people. Traditional Visual-Inertial Odometry (VIO) and Visual Simultaneous Localization and Mapping (VSLAM) methods, though adequate for local pose estimation, suffer from drift in the long term due to reliance on local sensor data. While GPS counteracts this drift, it is unavailable indoors and often unreliable in urban areas. An alternative is to localize the camera to an existing 3D map using visual-feature matching. This can provide centimeter-level accurate localization but is limited by the visual similarities between the current view and the map. This paper introduces a novel approach that achieves accurate and globally-consistent localization by aligning the sparse 3D point cloud generated by the VIO/VSLAM system to a digital twin using point-to-plane matching; no visual data association is needed. The proposed method provides a 6-DoF global measurement tightly integrated into the VIO/VSLAM system. Experiments run on a high-fidelity GPS simulator and real-world data collected from a drone demonstrate that our approach outperforms state-of-the-art VIO-GPS systems and offers superior robustness against viewpoint changes compared to the state-of-the-art Visual SLAM systems.

Paper Structure

This paper contains 19 sections, 8 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: We propose an approach to achieve drift-free visual SLAM by aligning the local visual SLAM point cloud to a digital twin using point-to-plane matching. The resulting relative transformation between the point cloud and the digital twin provides a global measurement that is then tightly integrated into the SLAM system to obtain global consistency and reduce drift.
  • Figure 2: Reference frames used in this work.
  • Figure 3: Factor graph representation of the proposed visual SLAM system with visual, inertial, and city digital twin registration factors. The system relies on GPS data to initialize.
  • Figure 4: Simulated data. Trajectory estimated by SVO-Digital Twin and the baselines together with the ground truth trajectory. Left: Top-down view, right: Side view.
  • Figure 5: View of the simulated city. The simulated trajectory is depicted in red.
  • ...and 4 more figures