Table of Contents
Fetching ...

Representing 3D sparse map points and lines for camera relocalization

Bach-Thuan Bui, Huy-Hoang Bui, Dinh-Tuan Tran, Joo-Ho Lee

TL;DR

This study shows how a lightweight neural network can learn to represent both 3D point and line features, and exhibit leading pose accuracy by harnessing the power of multiple learned mappings.

Abstract

Recent advancements in visual localization and mapping have demonstrated considerable success in integrating point and line features. However, expanding the localization framework to include additional mapping components frequently results in increased demand for memory and computational resources dedicated to matching tasks. In this study, we show how a lightweight neural network can learn to represent both 3D point and line features, and exhibit leading pose accuracy by harnessing the power of multiple learned mappings. Specifically, we utilize a single transformer block to encode line features, effectively transforming them into distinctive point-like descriptors. Subsequently, we treat these point and line descriptor sets as distinct yet interconnected feature sets. Through the integration of self- and cross-attention within several graph layers, our method effectively refines each feature before regressing 3D maps using two simple MLPs. In comprehensive experiments, our indoor localization findings surpass those of Hloc and Limap across both point-based and line-assisted configurations. Moreover, in outdoor scenarios, our method secures a significant lead, marking the most considerable enhancement over state-of-the-art learning-based methodologies. The source code and demo videos of this work are publicly available at: https://thpjp.github.io/pl2map/

Representing 3D sparse map points and lines for camera relocalization

TL;DR

This study shows how a lightweight neural network can learn to represent both 3D point and line features, and exhibit leading pose accuracy by harnessing the power of multiple learned mappings.

Abstract

Recent advancements in visual localization and mapping have demonstrated considerable success in integrating point and line features. However, expanding the localization framework to include additional mapping components frequently results in increased demand for memory and computational resources dedicated to matching tasks. In this study, we show how a lightweight neural network can learn to represent both 3D point and line features, and exhibit leading pose accuracy by harnessing the power of multiple learned mappings. Specifically, we utilize a single transformer block to encode line features, effectively transforming them into distinctive point-like descriptors. Subsequently, we treat these point and line descriptor sets as distinct yet interconnected feature sets. Through the integration of self- and cross-attention within several graph layers, our method effectively refines each feature before regressing 3D maps using two simple MLPs. In comprehensive experiments, our indoor localization findings surpass those of Hloc and Limap across both point-based and line-assisted configurations. Moreover, in outdoor scenarios, our method secures a significant lead, marking the most considerable enhancement over state-of-the-art learning-based methodologies. The source code and demo videos of this work are publicly available at: https://thpjp.github.io/pl2map/
Paper Structure (18 sections, 12 equations, 7 figures, 4 tables)

This paper contains 18 sections, 12 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Representing 3D point-line maps by PL2Map. We show an example of the results of the proposed learning method for representing 3D point-line features. The red camera poses in both predicted lines (a) and points (b) map are the ground truth poses of the input image on the left, and the blue ones are the estimated camera poses using predicted lines or points map.
  • Figure 2: PL2Map pipeline. We illustrate the architecture of PL2Map, which consists of three main components: Front-End, Attentional Refinement, and Mapping Regressors.
  • Figure 3: Line transformer encoder. We represent a 2D local line by uniformly sampling $T-2$ number of points inside the line segment of endpoints $p$ and $q$. A transformer-based model is then used to uniformly transform all point descriptors to a single feature with the same dimension, which can be considered as a line descriptor.
  • Figure 4: Line reprojection loss. Given two 2D endpoints $p$ and $q$, and their predictions of 3D endpoints $P$ and $Q$, we minimize the reprojection distance of $\pi(P)$ and $\pi(Q)$ to the 2D segment $pq$ on the image plane. This allows the length of $PQ$ in 3D space independent with 2D segment $pq$ length, which can also solve the occlusion problem in the camera view.
  • Figure 5: Reliable Line-Map Prediction Results. We show predicted line-map filtering with a different threshold $\hat{r}$ in RedKitchen scene from 7scenes shotton2013scene
  • ...and 2 more figures