Table of Contents
Fetching ...

MCGMapper: Light-Weight Incremental Structure from Motion and Visual Localization With Planar Markers and Camera Groups

Yusen Xie, Zhenmin Huang, Kai Chen, Lei Zhu, Jun Ma

TL;DR

The paper tackles structure-from-motion and indoor visual localization in texture-less/industrial settings where traditional feature-based methods struggle. It introduces MCGMapper, an incremental framework that combines front-end PnP initialization with back-end marker- and camera-group bundle adjustment to build a global, multi-size marker map using camera groups. Core contributions include marker BA for multiple marker sizes, camera-group BA with extrinsics as priors, a weighted information model for marker observations, and a synthetic dataset with ground-truth poses for quantitative benchmarking. Empirical results on public and proposed datasets show improved accuracy and speed, with robust performance in large-scale, multi-camera setups, and practical utility for labs, warehouses, and industrial environments. The work also provides open-source code, enabling broader adoption and benchmarking in marker-based SfM research.

Abstract

Structure from Motion (SfM) and visual localization in indoor texture-less scenes and industrial scenarios present prevalent yet challenging research topics. Existing SfM methods designed for natural scenes typically yield low accuracy or map-building failures due to insufficient robust feature extraction in such settings. Visual markers, with their artificially designed features, can effectively address these issues. Nonetheless, existing marker-assisted SfM methods encounter problems like slow running speed and difficulties in convergence; and also, they are governed by the strong assumption of unique marker size. In this paper, we propose a novel SfM framework that utilizes planar markers and multiple cameras with known extrinsics to capture the surrounding environment and reconstruct the marker map. In our algorithm, the initial poses of markers and cameras are calculated with Perspective-n-Points (PnP) in the front-end, while bundle adjustment methods customized for markers and camera groups are designed in the back-end to optimize the 6-DOF pose directly. Our algorithm facilitates the reconstruction of large scenes with different marker sizes, and its accuracy and speed of map building are shown to surpass existing methods. Our approach is suitable for a wide range of scenarios, including laboratories, basements, warehouses, and other industrial settings. Furthermore, we incorporate representative scenarios into simulations and also supply our datasets with pose labels to address the scarcity of quantitative ground-truth datasets in this research field. The datasets and source code are available on GitHub.

MCGMapper: Light-Weight Incremental Structure from Motion and Visual Localization With Planar Markers and Camera Groups

TL;DR

The paper tackles structure-from-motion and indoor visual localization in texture-less/industrial settings where traditional feature-based methods struggle. It introduces MCGMapper, an incremental framework that combines front-end PnP initialization with back-end marker- and camera-group bundle adjustment to build a global, multi-size marker map using camera groups. Core contributions include marker BA for multiple marker sizes, camera-group BA with extrinsics as priors, a weighted information model for marker observations, and a synthetic dataset with ground-truth poses for quantitative benchmarking. Empirical results on public and proposed datasets show improved accuracy and speed, with robust performance in large-scale, multi-camera setups, and practical utility for labs, warehouses, and industrial environments. The work also provides open-source code, enabling broader adoption and benchmarking in marker-based SfM research.

Abstract

Structure from Motion (SfM) and visual localization in indoor texture-less scenes and industrial scenarios present prevalent yet challenging research topics. Existing SfM methods designed for natural scenes typically yield low accuracy or map-building failures due to insufficient robust feature extraction in such settings. Visual markers, with their artificially designed features, can effectively address these issues. Nonetheless, existing marker-assisted SfM methods encounter problems like slow running speed and difficulties in convergence; and also, they are governed by the strong assumption of unique marker size. In this paper, we propose a novel SfM framework that utilizes planar markers and multiple cameras with known extrinsics to capture the surrounding environment and reconstruct the marker map. In our algorithm, the initial poses of markers and cameras are calculated with Perspective-n-Points (PnP) in the front-end, while bundle adjustment methods customized for markers and camera groups are designed in the back-end to optimize the 6-DOF pose directly. Our algorithm facilitates the reconstruction of large scenes with different marker sizes, and its accuracy and speed of map building are shown to surpass existing methods. Our approach is suitable for a wide range of scenarios, including laboratories, basements, warehouses, and other industrial settings. Furthermore, we incorporate representative scenarios into simulations and also supply our datasets with pose labels to address the scarcity of quantitative ground-truth datasets in this research field. The datasets and source code are available on GitHub.
Paper Structure (19 sections, 11 equations, 8 figures, 4 tables, 2 algorithms)

This paper contains 19 sections, 11 equations, 8 figures, 4 tables, 2 algorithms.

Figures (8)

  • Figure 1: The scenes simulated using Blenderblender. The red curve represents the movement trajectory of the camera or camera groups in each scene.
  • Figure 2: Camera groups setup in our simulated datasets. FoV of the camera (39.56° $\times$ 33.49°) is calculated with the calibrated camera intrinsic. In this paper, we use three cameras to evaluate our algorithm.
  • Figure 3: The overview of our incremental marker-CG SfM framework is illustrated. The framework is divided into three parts: Preprocessing, Global Localization, and Incremental Map Update. Marker recognition and information confidence calculation are integrated in the preprocessing part. For clear statement, map initialization is not included in this overview.
  • Figure 4: 3D reconstruction results in public datasetspmslameccv2018. We select three challenging sequences in this comparison experiment. Reconstruction results show that our algorithm reconstructs all sequences successfully. The number in the upper left corner of the image is the time consumed (in seconds) with the algorithm. The green boxes indicate the complete results and the red boxes indicate failure cases.
  • Figure 5: (a) A real scene equipped with multiple size markers. (b) Two cameras are coupled back to back at a 180$\degree$ angle. (c) Three cameras are coupled at a 120$\degree$ angle.
  • ...and 3 more figures