Reprojection Errors as Prompts for Efficient Scene Coordinate Regression

Ting-Ru Liu; Hsuan-Kung Yang; Jou-Min Liu; Chun-Wei Huang; Tsung-Chih Chiang; Quan Kong; Norimasa Kobori; Chun-Yi Lee

Reprojection Errors as Prompts for Efficient Scene Coordinate Regression

Ting-Ru Liu, Hsuan-Kung Yang, Jou-Min Liu, Chun-Wei Huang, Tsung-Chih Chiang, Quan Kong, Norimasa Kobori, Chun-Yi Lee

TL;DR

The paper addresses SCR-based visual localization, highlighting how dynamic objects and textureless regions hinder training stability and accuracy. It introduces Error-Guided Feature Selection (EGFS) coupled with the Segment Anything Model (SAM) and a confidence refinement mechanism to seed low-reprojection-error prompts, expand them into masks, and iteratively sample robust training regions without relying on fixed semantic labels, all while weighting losses by per-pixel confidence $c_i$. Empirically, EGFS achieves state-of-the-art or competitive results on Cambridge Landmarks and Indoor6 with smaller model sizes and reduced training time, and ablations confirm the positive contribution of both EGFS and confidence refinement. The method demonstrates practical impact by enabling efficient, robust SCR-based localization in diverse environments, leveraging semantic context through SAM and data-driven focus on reliable regions. Overall, the work advances SCR by integrating error-driven, SEM-aware sampling with confidence-aware optimization, offering a scalable approach for accurate 6-DoF pose estimation in real-world scenarios.

Abstract

Scene coordinate regression (SCR) methods have emerged as a promising area of research due to their potential for accurate visual localization. However, many existing SCR approaches train on samples from all image regions, including dynamic objects and texture-less areas. Utilizing these areas for optimization during training can potentially hamper the overall performance and efficiency of the model. In this study, we first perform an in-depth analysis to validate the adverse impacts of these areas. Drawing inspiration from our analysis, we then introduce an error-guided feature selection (EGFS) mechanism, in tandem with the use of the Segment Anything Model (SAM). This mechanism seeds low reprojection areas as prompts and expands them into error-guided masks, and then utilizes these masks to sample points and filter out problematic areas in an iterative manner. The experiments demonstrate that our method outperforms existing SCR approaches that do not rely on 3D information on the Cambridge Landmarks and Indoor6 datasets.

Reprojection Errors as Prompts for Efficient Scene Coordinate Regression

TL;DR

. Empirically, EGFS achieves state-of-the-art or competitive results on Cambridge Landmarks and Indoor6 with smaller model sizes and reduced training time, and ablations confirm the positive contribution of both EGFS and confidence refinement. The method demonstrates practical impact by enabling efficient, robust SCR-based localization in diverse environments, leveraging semantic context through SAM and data-driven focus on reliable regions. Overall, the work advances SCR by integrating error-driven, SEM-aware sampling with confidence-aware optimization, offering a scalable approach for accurate 6-DoF pose estimation in real-world scenarios.

Abstract

Paper Structure (33 sections, 2 equations, 8 figures, 5 tables)

This paper contains 33 sections, 2 equations, 8 figures, 5 tables.

Introduction
Related Work
Scene Coordinate Regression.
Emphasis on Robust Features for Localization.
Preliminary of Scene Coordinate Regression (SCR)
DSAC Variants.
ACE.
In-Depth Evaluation of Scene Coodinate Regression
Challenges in Scene Coordinate Regression
Analysis between Reprojection Error and Semantic Meaning
Methodology
Problem Definition and Framework Overview
Error-Guided Feature Selection (EGFS) with SAM
Scene Coordinate and EGFS Refinement with Confidence
Experimental Results
...and 18 more sections

Figures (8)

Figure 1: Visualization of the primary components (i.e., (d)-(h)) introduced in the proposed visual localization scheme. (d) illustrates the point prompts selected from (b) with low reprojection errors, while (e) presents an error-guided mask expanded from the prompted points in (d) using SAM. (f) displays the proposed error-guided feature selection (EGFS), which refines the mask from (e) with the predicted confidence map (c) to ensure high-quality scene coordinates are sampled for estimating the final camera pose. The point cloud constructed from the predicted scene coordinates is shown on the right-hand side (i.e., (g)-(h)), with the confidence (yellow parts) and the refined EGFS mask (green for selected areas; red for rejected areas).
Figure 2: Analysis between reprojection error and semantic meaning. The analysis result indicates the regions with low reprojection errors tend to have higher inlier ratios, while the errors do not always align with specific semantic categories, e.g., "tree” and "rug”.
Figure 3: An overview of the training framework.
Figure 4: An overview of the inference procedure.
Figure 5: Visualization of the EGFS mask refinement process at every five epochs, which depicts the reprojection errors at the beginning (epoch 5) and the end (epoch 20), as well as the refined error-guided masks used throughout training. The red dots represent low reprojection errors that serve as prompts, while the light green overlay denotes the refined EGFS masks. It can be observed that the EGFS masks enhances over epochs.
...and 3 more figures

Reprojection Errors as Prompts for Efficient Scene Coordinate Regression

TL;DR

Abstract

Reprojection Errors as Prompts for Efficient Scene Coordinate Regression

Authors

TL;DR

Abstract

Table of Contents

Figures (8)