Table of Contents
Fetching ...

From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting

Zhiwei Huang, Hailin Yu, Yichun Shentu, Jin Yuan, Guofeng Zhang

TL;DR

STDLoc addresses robust camera relocalization by introducing a scene-specific Feature Gaussian representation and a sparse-to-dense localization pipeline. It replaces traditional image retrieval-based localization with a matching-oriented landmark sampling and a scene-specific detector to obtain a reliable initial pose, followed by dense feature-map alignment for refinement. Key contributions include the matching-oriented sampling strategy, a self-supervised scene-specific detector, and a full sparse-to-dense localization framework that leverages a learned feature field for accurate 6DoF pose estimation in both indoor and outdoor environments. Empirical results on 7-Scenes and Cambridge Landmarks show that STDLoc achieves state-of-the-art localization accuracy and recall, with robustness to illumination and weak textures and practical running-time performance around several FPS on modern GPUs.

Abstract

This paper presents a novel camera relocalization method, STDLoc, which leverages Feature Gaussian as scene representation. STDLoc is a full relocalization pipeline that can achieve accurate relocalization without relying on any pose prior. Unlike previous coarse-to-fine localization methods that require image retrieval first and then feature matching, we propose a novel sparse-to-dense localization paradigm. Based on this scene representation, we introduce a novel matching-oriented Gaussian sampling strategy and a scene-specific detector to achieve efficient and robust initial pose estimation. Furthermore, based on the initial localization results, we align the query feature map to the Gaussian feature field by dense feature matching to enable accurate localization. The experiments on indoor and outdoor datasets show that STDLoc outperforms current state-of-the-art localization methods in terms of localization accuracy and recall.

From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting

TL;DR

STDLoc addresses robust camera relocalization by introducing a scene-specific Feature Gaussian representation and a sparse-to-dense localization pipeline. It replaces traditional image retrieval-based localization with a matching-oriented landmark sampling and a scene-specific detector to obtain a reliable initial pose, followed by dense feature-map alignment for refinement. Key contributions include the matching-oriented sampling strategy, a self-supervised scene-specific detector, and a full sparse-to-dense localization framework that leverages a learned feature field for accurate 6DoF pose estimation in both indoor and outdoor environments. Empirical results on 7-Scenes and Cambridge Landmarks show that STDLoc achieves state-of-the-art localization accuracy and recall, with robustness to illumination and weak textures and practical running-time performance around several FPS on modern GPUs.

Abstract

This paper presents a novel camera relocalization method, STDLoc, which leverages Feature Gaussian as scene representation. STDLoc is a full relocalization pipeline that can achieve accurate relocalization without relying on any pose prior. Unlike previous coarse-to-fine localization methods that require image retrieval first and then feature matching, we propose a novel sparse-to-dense localization paradigm. Based on this scene representation, we introduce a novel matching-oriented Gaussian sampling strategy and a scene-specific detector to achieve efficient and robust initial pose estimation. Furthermore, based on the initial localization results, we align the query feature map to the Gaussian feature field by dense feature matching to enable accurate localization. The experiments on indoor and outdoor datasets show that STDLoc outperforms current state-of-the-art localization methods in terms of localization accuracy and recall.

Paper Structure

This paper contains 18 sections, 7 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: STDLoc: Sparse-to-Dense Localization. We leverage Feature Gaussian as the scene representation, which supports direct 2D-3D sparse matching on landmarks and enables the alignment of the query feature map to the feature field through dense matching.
  • Figure 2: Feature Gaussian is trained by optimizing the radiance field loss $\mathcal{L}_{rgb}$ and feature field loss $\mathcal{L}_{f}$ jointly.
  • Figure 3: Matching-Oriented Sampling. Each Gaussian is assigned a matching score, followed by anchor sampling. For each anchor, the k nearest Gaussians are identified based on spatial distance, from which the highest-scoring Gaussian is selected.
  • Figure 4: Scene-Specific Detector Training. The centers of sampled landmarks are projected onto 2D images to guide the training of the scene-specific detector.
  • Figure 5: Overview of the sparse-to-dense localization pipeline based on Feature Gaussian.
  • ...and 7 more figures