Table of Contents
Fetching ...

Depth Completion with Multiple Balanced Bases and Confidence for Dense Monocular SLAM

Weijian Xie, Guanyi Chu, Quanhao Qian, Yihao Yu, Hai Li, Danpeng Chen, Shangjin Zhai, Nan Wang, Hujun Bao, Guofeng Zhang

TL;DR

The paper tackles the challenge of achieving dense monocular SLAM on mobile devices by integrating a lightweight depth completion network into sparse SLAM. BBC-Net outputs multiple balanced depth bases and a per-pixel confidence, enabling a final depth $D = \sum_{i=1}^{N} w_i B_i$ that can be optimized within the SLAM backend using depth weight factors. It introduces training losses for depth consistency, confidence, and bases balance, and couples the network with a set of depth-weight factors and a marginalization scheme to ensure robustness and real-time performance. Across EuRoC, ICL-NUIM, and TUM, BBC-Net-based BBC-VINS and BBC-ORBSLAM deliver improved dense mapping quality while maintaining real-time operation on mobile hardware, with an online demo validating practical deployment.

Abstract

Dense SLAM based on monocular cameras does indeed have immense application value in the field of AR/VR, especially when it is performed on a mobile device. In this paper, we propose a novel method that integrates a light-weight depth completion network into a sparse SLAM system using a multi-basis depth representation, so that dense mapping can be performed online even on a mobile phone. Specifically, we present a specifically optimized multi-basis depth completion network, called BBC-Net, tailored to the characteristics of traditional sparse SLAM systems. BBC-Net can predict multiple balanced bases and a confidence map from a monocular image with sparse points generated by off-the-shelf keypoint-based SLAM systems. The final depth is a linear combination of predicted depth bases that can be optimized by tuning the corresponding weights. To seamlessly incorporate the weights into traditional SLAM optimization and ensure efficiency and robustness, we design a set of depth weight factors, which makes our network a versatile plug-in module, facilitating easy integration into various existing sparse SLAM systems and significantly enhancing global depth consistency through bundle adjustment. To verify the portability of our method, we integrate BBC-Net into two representative SLAM systems. The experimental results on various datasets show that the proposed method achieves better performance in monocular dense mapping than the state-of-the-art methods. We provide an online demo running on a mobile phone, which verifies the efficiency and mapping quality of the proposed method in real-world scenarios.

Depth Completion with Multiple Balanced Bases and Confidence for Dense Monocular SLAM

TL;DR

The paper tackles the challenge of achieving dense monocular SLAM on mobile devices by integrating a lightweight depth completion network into sparse SLAM. BBC-Net outputs multiple balanced depth bases and a per-pixel confidence, enabling a final depth that can be optimized within the SLAM backend using depth weight factors. It introduces training losses for depth consistency, confidence, and bases balance, and couples the network with a set of depth-weight factors and a marginalization scheme to ensure robustness and real-time performance. Across EuRoC, ICL-NUIM, and TUM, BBC-Net-based BBC-VINS and BBC-ORBSLAM deliver improved dense mapping quality while maintaining real-time operation on mobile hardware, with an online demo validating practical deployment.

Abstract

Dense SLAM based on monocular cameras does indeed have immense application value in the field of AR/VR, especially when it is performed on a mobile device. In this paper, we propose a novel method that integrates a light-weight depth completion network into a sparse SLAM system using a multi-basis depth representation, so that dense mapping can be performed online even on a mobile phone. Specifically, we present a specifically optimized multi-basis depth completion network, called BBC-Net, tailored to the characteristics of traditional sparse SLAM systems. BBC-Net can predict multiple balanced bases and a confidence map from a monocular image with sparse points generated by off-the-shelf keypoint-based SLAM systems. The final depth is a linear combination of predicted depth bases that can be optimized by tuning the corresponding weights. To seamlessly incorporate the weights into traditional SLAM optimization and ensure efficiency and robustness, we design a set of depth weight factors, which makes our network a versatile plug-in module, facilitating easy integration into various existing sparse SLAM systems and significantly enhancing global depth consistency through bundle adjustment. To verify the portability of our method, we integrate BBC-Net into two representative SLAM systems. The experimental results on various datasets show that the proposed method achieves better performance in monocular dense mapping than the state-of-the-art methods. We provide an online demo running on a mobile phone, which verifies the efficiency and mapping quality of the proposed method in real-world scenarios.
Paper Structure (22 sections, 12 equations, 13 figures, 7 tables)

This paper contains 22 sections, 12 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: (a) The real-time dense mapping process of BBC-ORBSLAM. BBC-Net complements the sparse depth generated by traditional sparse SLAM (left top of (a)) to dense depth (left bottom of (a)). By incorporating the predicted depth into SLAM optimization, a globally consistent mesh is obtained (right of (a)). (b) The dense mapping result of BBC-VINS. Our method can also recover a globally consistent dense mesh (right of (b)) from point clouds with poor quality generated by VINS (left of (b)).
  • Figure 2: General framework. When integrating our method into a traditional SLAM system, it is as simple as passing the sparse depths generated by the tracking module to BBC-Net and then incorporating the predicted bases and confidence, represented as depth weight factors, into the SLAM's optimizer.
  • Figure 3: (a) The depth bases predicted by our network exhibit excellent information distribution, with each base learning the relative depth of a region. (b) The depth bases generated by qu2020depth exhibit obvious imbalance, with typically only one base output containing almost all the information.
  • Figure 4: Outliers are highlighted with blue boxes. Although outliers constitute a small proportion of the predicted depths, they are clearly visible as prominent protrusions in the corresponding 3D mesh.
  • Figure 5: BBC-Net architecture. The network takes a grayscale image and sparse depth as inputs and outputs a set of depth bases and a confidence map. The pink arrow is activated only in the training process.
  • ...and 8 more figures