SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality

Hongjia Zhai; Xiyu Zhang; Boming Zhao; Hai Li; Yijia He; Zhaopeng Cui; Hujun Bao; Guofeng Zhang

SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality

Hongjia Zhai, Xiyu Zhang, Boming Zhao, Hai Li, Yijia He, Zhaopeng Cui, Hujun Bao, Guofeng Zhang

TL;DR

This work develops an unbiased 3D scene-specific descriptor decoder for Gaussian primitives, distilled from a constructed feature volume, and introduces a salient 3D landmark selection algorithm that selects a suitable primitive subset based on the saliency score for localization.

Abstract

Visual localization plays an important role in the applications of Augmented Reality (AR), which enable AR devices to obtain their 6-DoF pose in the pre-build map in order to render virtual content in real scenes. However, most existing approaches can not perform novel view rendering and require large storage capacities for maps. To overcome these limitations, we propose an efficient visual localization method capable of high-quality rendering with fewer parameters. Specifically, our approach leverages 3D Gaussian primitives as the scene representation. To ensure precise 2D-3D correspondences for pose estimation, we develop an unbiased 3D scene-specific descriptor decoder for Gaussian primitives, distilled from a constructed feature volume. Additionally, we introduce a salient 3D landmark selection algorithm that selects a suitable primitive subset based on the saliency score for localization. We further regularize key Gaussian primitives to prevent anisotropic effects, which also improves localization performance. Extensive experiments on two widely used datasets demonstrate that our method achieves superior or comparable rendering and localization performance to state-of-the-art implicit-based visual localization approaches. Project page: \href{https://zju3dv.github.io/splatloc}{https://zju3dv.github.io/splatloc}.

SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality

TL;DR

Abstract

Paper Structure (21 sections, 20 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 21 sections, 20 equations, 8 figures, 5 tables, 1 algorithm.

Introduction
Related Works
Visual Localization
3D Gaussian Splatting
3D Feature Field
Method
3D Gaussian Scene Representation
Unbiased 3D Descriptor Learning
Salient 3D Landmark Selection
Key Gaussian Primitive Regularization
Objective Functions
Localization
Experiments
Datasets
Implementation Details
...and 6 more sections

Figures (8)

Figure 1: Reconstruction processes. We incrementally initialize the Gaussian primitives, and each primitive is associated with position $\mu$, rotation $q$, scale $s$, opacity $\sigma$, color $c$, and 3D landmark score $a$. For key Gaussian primitives, we perform soft isotropy and scale regularization to mitigate the anisotropic results. The color loss $\mathcal{L}_{c}$, depth loss $\mathcal{L}_d$, 3D landmark loss $\mathcal{L}_m$, and regularization loss $\mathcal{L}_{reg}$ are used to optimize the properties of each primitive via differentiable rasterization.
Figure 2: Illustration of biased and unbiased 3D descriptor field learning. (a) The biased 3D feature optimization of previous works qin2024langsplatshi2024_gs_language_embed, they use alpha-blending to obtain the 2D blended feature. (b) Our unbiased 3D feature learning scheme, which directly learns the 3D feature decoder from the constructed feature volume of multi-view feature maps.
Figure 3: The pipeline of our unbiased 3D primitive descriptor learning. We first encode images based on the 2D CNN model superpoint to obtain the multi-view feature maps and construct the 3D scene feature volume according to the depth and pose information. To enhance the representation ability of the 3D feature decoder, we use multi-resolution parametric encoding to aid the 3D scene-specific descriptor learning. Besides, we only sample descriptors on the scene surface for effective distillation.
Figure 4: Visualization of novel view synthesis. We show some novel view rendering results from different scenes. From top to bottom, there are results of PNeRFLoc pnerfloc, ours, and ground truth. Our rendering results are more clear and have less noise information.
Figure 5: Visual localization performance of using different resolutions of parametric encodings. We report median translation and rotation errors (cm, degree) on two selected scenes.
...and 3 more figures

SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality

TL;DR

Abstract

SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality

Authors

TL;DR

Abstract

Table of Contents

Figures (8)