Table of Contents
Fetching ...

MonoSLAM: Robust Monocular SLAM with Global Structure Optimization

Bingzheng Jiang, Jiayuan Wang, Han Ding, Lijun Zhu

TL;DR

MonoSLAM introduces a robust monocular SLAM framework that fuses point, line, and vanishing-point features via Global Primitives to achieve accurate pose estimation and mapping in texture-poor environments. The system employs a two-stage approach with Local Primitives for immediate scene geometry and Global Primitives aggregated across non-overlapping frames, integrated through a robust factor-graph optimization that jointly enforces local and global geometric cues. Key contributions include a multi-frame non-overlapping association strategy, optical-flow-assisted line fusion, vanishing-direction fusion, and a novel global-primitive factor in the optimization, leading to improved trajectory accuracy on challenging benchmarks like ICL-NUIM and EuRoC. The findings demonstrate the practical potential of leveraging structural regularities without environmental priors, though future work could incorporate IMU data to enhance stability under dynamic motions.

Abstract

This paper presents a robust monocular visual SLAM system that simultaneously utilizes point, line, and vanishing point features for accurate camera pose estimation and mapping. To address the critical challenge of achieving reliable localization in low-texture environments, where traditional point-based systems often fail due to insufficient visual features, we introduce a novel approach leveraging Global Primitives structural information to improve the system's robustness and accuracy performance. Our key innovation lies in constructing vanishing points from line features and proposing a weighted fusion strategy to build Global Primitives in the world coordinate system. This strategy associates multiple frames with non-overlapping regions and formulates a multi-frame reprojection error optimization, significantly improving tracking accuracy in texture-scarce scenarios. Evaluations on various datasets show that our system outperforms state-of-the-art methods in trajectory precision, particularly in challenging environments.

MonoSLAM: Robust Monocular SLAM with Global Structure Optimization

TL;DR

MonoSLAM introduces a robust monocular SLAM framework that fuses point, line, and vanishing-point features via Global Primitives to achieve accurate pose estimation and mapping in texture-poor environments. The system employs a two-stage approach with Local Primitives for immediate scene geometry and Global Primitives aggregated across non-overlapping frames, integrated through a robust factor-graph optimization that jointly enforces local and global geometric cues. Key contributions include a multi-frame non-overlapping association strategy, optical-flow-assisted line fusion, vanishing-direction fusion, and a novel global-primitive factor in the optimization, leading to improved trajectory accuracy on challenging benchmarks like ICL-NUIM and EuRoC. The findings demonstrate the practical potential of leveraging structural regularities without environmental priors, though future work could incorporate IMU data to enhance stability under dynamic motions.

Abstract

This paper presents a robust monocular visual SLAM system that simultaneously utilizes point, line, and vanishing point features for accurate camera pose estimation and mapping. To address the critical challenge of achieving reliable localization in low-texture environments, where traditional point-based systems often fail due to insufficient visual features, we introduce a novel approach leveraging Global Primitives structural information to improve the system's robustness and accuracy performance. Our key innovation lies in constructing vanishing points from line features and proposing a weighted fusion strategy to build Global Primitives in the world coordinate system. This strategy associates multiple frames with non-overlapping regions and formulates a multi-frame reprojection error optimization, significantly improving tracking accuracy in texture-scarce scenarios. Evaluations on various datasets show that our system outperforms state-of-the-art methods in trajectory precision, particularly in challenging environments.

Paper Structure

This paper contains 27 sections, 18 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Example results of the proposed method. (a) Point and line features extracted from a single RGB image; (b) Segmented lines that are associated with Global Primitives; (c) 3D sparse map established by point and line landmarks; (d) and (e) Connected frames linked via green and red connections, respectively, to build up the proposed covisibility graph and Global Primitive association graph.
  • Figure 2: Architecture of the Proposed System. The MonoSLAM framework comprises a robust front-end and a multi-level back-end. In the front-end, point, line, and vanishing point features are extracted from the RGB image to establish a rich representation of the scene. In the back-end, the framework first leverages the scene structure, represented by points and lines, to estimate the camera pose and update the map. Subsequently, global primitive constraints are integrated to further refine the camera pose estimation and enhance the map accuracy. This dual-stage optimization approach ensures precise and reliable performance in complex environments.
  • Figure 3: 3D Trajectory Comparison on ICL-NUIM