GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting

Changkun Liu; Shuai Chen; Yash Bhalgat; Siyan Hu; Ming Cheng; Zirui Wang; Victor Adrian Prisacariu; Tristan Braud

GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting

Changkun Liu, Shuai Chen, Yash Bhalgat, Siyan Hu, Ming Cheng, Zirui Wang, Victor Adrian Prisacariu, Tristan Braud

TL;DR

GS-CPR tackles the gap in pose refinement accuracy and efficiency by using 3D Gaussian Splatting as a scene representation to render high-quality views and depth, enabling robust 2D-3D correspondences from RGB images. A key component is exposure-adaptive rendering (ACT) that aligns synthetic views with query lighting, while MASt3R provides dense 2D-2D matching to generate 2D-3D correspondences for PnP+RANSAC refinement, all in a one-shot process. A faster variant, GS-CPR_rel, leverages MASt3R’s relative pose and depth to recover scale without 2D-3D matching. Across indoor and outdoor benchmarks, GS-CPR yields state-of-the-art indoor accuracy and substantial runtime advantages over NeRF-based methods, demonstrating a practical, descriptor-free approach to camera relocalization that can plug into existing APR/SCR pipelines.

Abstract

We leverage 3D Gaussian Splatting (3DGS) as a scene representation and propose a novel test-time camera pose refinement (CPR) framework, GS-CPR. This framework enhances the localization accuracy of state-of-the-art absolute pose regression and scene coordinate regression methods. The 3DGS model renders high-quality synthetic images and depth maps to facilitate the establishment of 2D-3D correspondences. GS-CPR obviates the need for training feature extractors or descriptors by operating directly on RGB images, utilizing the 3D foundation model, MASt3R, for precise 2D matching. To improve the robustness of our model in challenging outdoor environments, we incorporate an exposure-adaptive module within the 3DGS framework. Consequently, GS-CPR enables efficient one-shot pose refinement given a single RGB query and a coarse initial pose estimation. Our proposed approach surpasses leading NeRF-based optimization methods in both accuracy and runtime across indoor and outdoor visual localization benchmarks, achieving new state-of-the-art accuracy on two indoor datasets. The project page is available at https://xrim-lab.github.io/GS-CPR/.

GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting

TL;DR

Abstract

Paper Structure (20 sections, 3 equations, 12 figures, 13 tables)

This paper contains 20 sections, 3 equations, 12 figures, 13 tables.

Introduction
Related Work
Proposed Method
3DGS Test-time Exposure Adaptation
Pose Refinement with 2D-3D Correspondences
Faster Alternative with Relative Post Estimation
Experiments
Evaluation Setup
Localization Accuracy
Runtime Analysis
Ablation study
Discussion
Conclusion
Appendix
GT Poses Details
...and 5 more sections

Figures (12)

Figure 1: GS-CPR refines pose predictions of state-of-the-art APR and SCR models in a one-shot manner, achieving greater accuracy compared to the iterative neural refinement method, such as NeFeS chen2024neural. Each subfigure is divided by a diagonal line, with the bottom left part rendered using the estimated/refined pose and the top right part displaying the ground truth image.
Figure 2: Overview of GS-CPR. We assume the availability of a pre-trained pose estimator $\mathcal{F}$ and a pre-trained 3DGS model $\mathcal{H}$ of the scene. For a query image $I_q$, we first obtain an initial estimated pose $\hat{p}$ from the pose estimator $\mathcal{F}$. Our goal is to output a refined pose $\hat{p'}$.
Figure 3: Overview of GS-CPR$_{\text{rel}}$. Different from GS-CPR in Figure \ref{['fig:framework']} (highlight with the red box), we use $\hat{I_d}$ to recover the scale $s$ of $\mathbf{t}_\text{rel}$. Then we calculate the refined pose $\hat{p}'$ based on $\mathbf{R}_\text{rel}$ and $s\mathbf{t}_\text{rel}$ without matching.
Figure 4: Our GS-CPR enhances pose predictions for Marepo, DFNet, and ACE. Each subfigure is divided by a diagonal line, with the bottom left part rendered using the estimated/refined pose and the top right part displaying the ground truth image. Patches highlighting visual differences are emphasized with green insets for enhanced visibility.
Figure 5: Benefit of the ACT module. A regular 3DGS model tends to render images based on the lighting conditions and the appearance of its training frames, as demonstrated by the synthetic view of Scaffold-GS in (b). However, in challenging visual localization datasets, such as ShopFacade in the Cambridge Landmarks, some query frames may have different exposures compared to the training frames. (c) Our proposed Scaffold-GS + ACT can adaptively adjust the exposure based on the query's histogram.
...and 7 more figures

GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting

TL;DR

Abstract

GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting

Authors

TL;DR

Abstract

Table of Contents

Figures (12)