Table of Contents
Fetching ...

PAS-SLAM: A Visual SLAM System for Planar Ambiguous Scenes

Xinggang Hu, Yanmin Wu, Mingyuan Zhao, Linghao Yang, Xiangkui Zhang, Xiangyang Ji

TL;DR

The paper addresses degraded SLAM performance in planar ambiguous scenes due to unreliable plane extraction and weak data association. It introduces a planar-feature SLAM pipeline that fuses semantic information with plane edges/vertices, employs integrated data association using plane parameters, IoU, and non-parametric tests, and optimizes with a multi-constraint factor graph in Hessian form $\pi=(n^T,d)^T$ with minimal parameterization $q(\pi)=(\phi,\varphi,d)$. The main contributions are (1) advanced plane processing including re-fitting and object-box associations, (2) a robust data association framework combining multiple cues, and (3) a multi-constraint pose-optimization strategy with specialized factors for points, planes, boxes, and plane–plane relations, followed by plane fusion and map update. The results on public indoor datasets show competitive accuracy and robustness in map construction and camera localization, highlighting the method’s potential for improved environment perception and AR applications.

Abstract

Visual SLAM (Simultaneous Localization and Mapping) based on planar features has found widespread applications in fields such as environmental structure perception and augmented reality. However, current research faces challenges in accurately localizing and mapping in planar ambiguous scenes, primarily due to the poor accuracy of the employed planar features and data association methods. In this paper, we propose a visual SLAM system based on planar features designed for planar ambiguous scenes, encompassing planar processing, data association, and multi-constraint factor graph optimization. We introduce a planar processing strategy that integrates semantic information with planar features, extracting the edges and vertices of planes to be utilized in tasks such as plane selection, data association, and pose optimization. Next, we present an integrated data association strategy that combines plane parameters, semantic information, projection IoU (Intersection over Union), and non-parametric tests, achieving accurate and robust plane data association in planar ambiguous scenes. Finally, we design a set of multi-constraint factor graphs for camera pose optimization. Qualitative and quantitative experiments conducted on publicly available datasets demonstrate that our proposed system competes effectively in both accuracy and robustness in terms of map construction and camera localization compared to state-of-the-art methods.

PAS-SLAM: A Visual SLAM System for Planar Ambiguous Scenes

TL;DR

The paper addresses degraded SLAM performance in planar ambiguous scenes due to unreliable plane extraction and weak data association. It introduces a planar-feature SLAM pipeline that fuses semantic information with plane edges/vertices, employs integrated data association using plane parameters, IoU, and non-parametric tests, and optimizes with a multi-constraint factor graph in Hessian form with minimal parameterization . The main contributions are (1) advanced plane processing including re-fitting and object-box associations, (2) a robust data association framework combining multiple cues, and (3) a multi-constraint pose-optimization strategy with specialized factors for points, planes, boxes, and plane–plane relations, followed by plane fusion and map update. The results on public indoor datasets show competitive accuracy and robustness in map construction and camera localization, highlighting the method’s potential for improved environment perception and AR applications.

Abstract

Visual SLAM (Simultaneous Localization and Mapping) based on planar features has found widespread applications in fields such as environmental structure perception and augmented reality. However, current research faces challenges in accurately localizing and mapping in planar ambiguous scenes, primarily due to the poor accuracy of the employed planar features and data association methods. In this paper, we propose a visual SLAM system based on planar features designed for planar ambiguous scenes, encompassing planar processing, data association, and multi-constraint factor graph optimization. We introduce a planar processing strategy that integrates semantic information with planar features, extracting the edges and vertices of planes to be utilized in tasks such as plane selection, data association, and pose optimization. Next, we present an integrated data association strategy that combines plane parameters, semantic information, projection IoU (Intersection over Union), and non-parametric tests, achieving accurate and robust plane data association in planar ambiguous scenes. Finally, we design a set of multi-constraint factor graphs for camera pose optimization. Qualitative and quantitative experiments conducted on publicly available datasets demonstrate that our proposed system competes effectively in both accuracy and robustness in terms of map construction and camera localization compared to state-of-the-art methods.
Paper Structure (29 sections, 18 equations, 4 figures, 3 tables)

This paper contains 29 sections, 18 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview: (a) and (b) show scenes with plane ambiguity; (c) current related research struggles to achieve accurate mapping; (d) mapping results of our approach.
  • Figure 2: Our system framework. The system takes RGB images and depth images as input (A). First, information extraction is performed (B), followed by plane processing (C). After that, data association of multiple information is conducted (D). Camera pose optimization is then executed using a factor graph with multiple constraints (E). Finally, plane fusion and updating take place (F). The system outputs camera poses and a plane map (G).
  • Figure 3: The plane mapping results for several algorithms. From top to bottom, they are SP-SLAM, PlanarSLAM, ManhattanSLAM, w/o-(C)(D), w/o-(D) and ours.
  • Figure 4: The results of plane vertex extraction. Each pair of images consists of a right-side image showing the vertices detected based on plane edge points and a left-side image displaying the projection of plane vertices onto the image.