DVI-SLAM: A Dual Visual Inertial SLAM Network
Xiongfeng Peng, Zhihua Liu, Weiming Li, Ping Tan, SoonYong Cho, Qiang Wang
TL;DR
DVI-SLAM addresses the challenge of effectively fusing multiple visual cues with IMU data in SLAM by introducing a differentiable, end-to-end framework that dynamically weighs re-projection, feature-metric, and inertial residuals through learned confidence maps. It extends the DROID-SLAM paradigm with a dual-visual-factor design and a multi-factor DBA layer, enabling tightly-coupled optimization over pose, depth, and IMU motion. The approach achieves state-of-the-art results on TartanAir, EuRoC, and ETH3D-SLAM, with large reductions in absolute trajectory error when all three factors are fused. This work demonstrates the practical value of dynamic, reliability-weighted factor fusion for robust visual-inertial navigation and mapping, with potential extensions to deeper integration of IMU factors and richer 3D scene representations.
Abstract
Recent deep learning based visual simultaneous localization and mapping (SLAM) methods have made significant progress. However, how to make full use of visual information as well as better integrate with inertial measurement unit (IMU) in visual SLAM has potential research value. This paper proposes a novel deep SLAM network with dual visual factors. The basic idea is to integrate both photometric factor and re-projection factor into the end-to-end differentiable structure through multi-factor data association module. We show that the proposed network dynamically learns and adjusts the confidence maps of both visual factors and it can be further extended to include the IMU factors as well. Extensive experiments validate that our proposed method significantly outperforms the state-of-the-art methods on several public datasets, including TartanAir, EuRoC and ETH3D-SLAM. Specifically, when dynamically fusing the three factors together, the absolute trajectory error for both monocular and stereo configurations on EuRoC dataset has reduced by 45.3% and 36.2% respectively.
