Table of Contents
Fetching ...

MLINE-VINS: Robust Monocular Visual-Inertial SLAM With Flow Manhattan and Line Features

Chao Ye, Haoyuan Li, Weiyang Lin, Xianqiang Yang

TL;DR

MLINE-VINS addresses the fragility of monocular VIO in indoor, texture-poor environments by fusing line features with the Manhattan World (MW) assumption. It introduces a fast line optical flow that handles varying line lengths, a tracking-by-detection mechanism for Manhattan frames, and a back-end optimization that enforces both local and global MW and structural constraints, with VIO–MW frame alignment to simplify coordinate transforms. The approach yields improved accuracy and long-range robustness on EuRoC, KAIST-VIO, and real-world indoor datasets, while maintaining real-time operation. The work demonstrates that integrating structural regularities with lines can significantly reduce drift and improve reliability in challenging scenarios, offering a practical path for robust monocular VIO in man-made environments.

Abstract

In this paper we introduce MLINE-VINS, a novel monocular visual-inertial odometry (VIO) system that leverages line features and Manhattan Word assumption. Specifically, for line matching process, we propose a novel geometric line optical flow algorithm that efficiently tracks line features with varying lengths, whitch is do not require detections and descriptors in every frame. To address the instability of Manhattan estimation from line features, we propose a tracking-by-detection module that consistently tracks and optimizes Manhattan framse in consecutive images. By aligning the Manhattan World with the VIO world frame, the tracking could restart using the latest pose from back-end, simplifying the coordinate transformations within the system. Furthermore, we implement a mechanism to validate Manhattan frames and a novel global structural constraints back-end optimization. Extensive experiments results on vairous datasets, including benchmark and self-collected datasets, show that the proposed approach outperforms existing methods in terms of accuracy and long-range robustness. The source code of our method is available at: https://github.com/LiHaoy-ux/MLINE-VINS.

MLINE-VINS: Robust Monocular Visual-Inertial SLAM With Flow Manhattan and Line Features

TL;DR

MLINE-VINS addresses the fragility of monocular VIO in indoor, texture-poor environments by fusing line features with the Manhattan World (MW) assumption. It introduces a fast line optical flow that handles varying line lengths, a tracking-by-detection mechanism for Manhattan frames, and a back-end optimization that enforces both local and global MW and structural constraints, with VIO–MW frame alignment to simplify coordinate transforms. The approach yields improved accuracy and long-range robustness on EuRoC, KAIST-VIO, and real-world indoor datasets, while maintaining real-time operation. The work demonstrates that integrating structural regularities with lines can significantly reduce drift and improve reliability in challenging scenarios, offering a practical path for robust monocular VIO in man-made environments.

Abstract

In this paper we introduce MLINE-VINS, a novel monocular visual-inertial odometry (VIO) system that leverages line features and Manhattan Word assumption. Specifically, for line matching process, we propose a novel geometric line optical flow algorithm that efficiently tracks line features with varying lengths, whitch is do not require detections and descriptors in every frame. To address the instability of Manhattan estimation from line features, we propose a tracking-by-detection module that consistently tracks and optimizes Manhattan framse in consecutive images. By aligning the Manhattan World with the VIO world frame, the tracking could restart using the latest pose from back-end, simplifying the coordinate transformations within the system. Furthermore, we implement a mechanism to validate Manhattan frames and a novel global structural constraints back-end optimization. Extensive experiments results on vairous datasets, including benchmark and self-collected datasets, show that the proposed approach outperforms existing methods in terms of accuracy and long-range robustness. The source code of our method is available at: https://github.com/LiHaoy-ux/MLINE-VINS.

Paper Structure

This paper contains 36 sections, 33 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Results of the proposed VIO system on EuRoc MH05. The blue, yellow, and red lines represent 3D structural lines along the X, Y, and Z directions, respectively. The green trajectory indicates the camera's historical poses. The black poins represent 3D points, and the 2D structural lines extracted from the RGB image are also shown at the top.
  • Figure 2: Overview of MLINE-VINS. The orange boxes highlight the new components introduced in this paper. Upon receiving the RGB input, points and lines are extracted and tracked in parallel. The Manhattan tracking-by-detection module is then executed to estimate MFs in consecutive images, with each line being clustered according to a principal axis. If the MF is lost, the last camera state from the back-end is used as the initial value to restart tracking. After system initialization, the VIO world frame is aligned with the MW, and optimization is performed using the local and global constraints of Manhattan and structural lines.
  • Figure 3: Rotation estimation between camera frame and MW. The extrinsic matrix $R^M_W$ represents the rotation between VIO world frame and MW. The rotation changes between MFs in camera coordinate are represented by $\mathbf{R}^{c_i}_M$.
  • Figure 4: Factor graph for MLINE-VINS. The 'Map points' and 'Map lines' nodes represent all point and line features within the sliding window. 'Map Lines' include both structural and non-structural lines. Structural lines provide both reprojection and structural constraints, while non-structural lines only contribute reprojection constraints.
  • Figure 5: Diagram of Line feature tracking model. $l_i$ represents a line feature in frame $F_t$, and $l_i'$ is the tracked line feature in frame $F_{t+\delta t}$. $g1$, $g2$, $g3$, $g4$ denote the changes in the horizontal coordinate, vertical coordinate, angle and length changes of the line feature, respectively.
  • ...and 10 more figures