Table of Contents
Fetching ...

Improved 3D Point-Line Mapping Regression for Camera Relocalization

Bach-Thuan Bui, Huy-Hoang Bui, Yasuyuki Fujii, Dinh-Tuan Tran, Joo-Ho Lee

TL;DR

This work tackles camera relocalization by separating the learning of 3D coordinates for points and lines into two dedicated regression branches, mitigating bias from feature imbalance. It introduces a focus-mode architecture with an early learnable pruning layer and self-attention modules to robustly refine descriptors before regression, plus a line transformer encoder for line features. The approach demonstrates consistent improvements over prior regression-based methods on 7Scenes and Indoor-6, achieving competitive performance relative to FM-based systems while reducing storage and computational demands. The method is validated with thorough ablations and practical considerations, and code is released to enable public use and benchmarking.

Abstract

In this paper, we present a new approach for improving 3D point and line mapping regression for camera re-localization. Previous methods typically rely on feature matching (FM) with stored descriptors or use a single network to encode both points and lines. While FM-based methods perform well in large-scale environments, they become computationally expensive with a growing number of mapping points and lines. Conversely, approaches that learn to encode mapping features within a single network reduce memory footprint but are prone to overfitting, as they may capture unnecessary correlations between points and lines. We propose that these features should be learned independently, each with a distinct focus, to achieve optimal accuracy. To this end, we introduce a new architecture that learns to prioritize each feature independently before combining them for localization. Experimental results demonstrate that our approach significantly enhances the 3D map point and line regression performance for camera re-localization. The implementation of our method will be publicly available at: https://github.com/ais-lab/pl2map/.

Improved 3D Point-Line Mapping Regression for Camera Relocalization

TL;DR

This work tackles camera relocalization by separating the learning of 3D coordinates for points and lines into two dedicated regression branches, mitigating bias from feature imbalance. It introduces a focus-mode architecture with an early learnable pruning layer and self-attention modules to robustly refine descriptors before regression, plus a line transformer encoder for line features. The approach demonstrates consistent improvements over prior regression-based methods on 7Scenes and Indoor-6, achieving competitive performance relative to FM-based systems while reducing storage and computational demands. The method is validated with thorough ablations and practical considerations, and code is released to enable public use and benchmarking.

Abstract

In this paper, we present a new approach for improving 3D point and line mapping regression for camera re-localization. Previous methods typically rely on feature matching (FM) with stored descriptors or use a single network to encode both points and lines. While FM-based methods perform well in large-scale environments, they become computationally expensive with a growing number of mapping points and lines. Conversely, approaches that learn to encode mapping features within a single network reduce memory footprint but are prone to overfitting, as they may capture unnecessary correlations between points and lines. We propose that these features should be learned independently, each with a distinct focus, to achieve optimal accuracy. To this end, we introduce a new architecture that learns to prioritize each feature independently before combining them for localization. Experimental results demonstrate that our approach significantly enhances the 3D map point and line regression performance for camera re-localization. The implementation of our method will be publicly available at: https://github.com/ais-lab/pl2map/.

Paper Structure

This paper contains 17 sections, 12 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Camera localization results and predicted line maps in Scene-1, Indoor-6 do2022learning by PL2Map bui2024representing (left) and proposed method (right). The proposed method gives accurate camera pose estimates by addressing problems of imbalance and noisy features in joint training.
  • Figure 2: Proposed architecture. The architecture focuses on the distinct regression of 3D points and lines, consisting of two main components: (1) the Front-End, which preprocesses and jointly extracts point and line descriptors via a shared feature extractor, and (2) the Mapping Regressors, which include separate regression branches dedicated to point and line maps.
  • Figure 3: Line transformer encoder. We use a transformer-based model to encode a sample of $C$ point descriptors to a single line descriptor, left figure. On the right, we illustrate the transformer architecture in detail.
  • Figure 4: Indoor-6 train-test images. We show an example of training and testing images from the Indoor-6 dataset do2022learning, where variations in capture times present a challenge for regression-based methods.
  • Figure 5: Qualitative results on Indoor-6. We display a random sample of 50 test images with their estimated poses by the proposed method when using both predicted 3D points and lines. The ground truth poses are indicated in red color. We additionally show the predicted 3D lines in the background using those images.
  • ...and 1 more figures