Table of Contents
Fetching ...

LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images

Yuzhou Cheng, Jianhao Jiao, Yue Wang, Dimitrios Kanoulas

TL;DR

LoGS is presented, a vision-based localization pipeline utilizing the 3D Gaussian Splatting technique as scene representation that allows high-quality novel view synthesis and its SoTA accuracy in estimating camera poses and robustness under challenging few-shot conditions.

Abstract

Visual localization involves estimating a query image's 6-DoF (degrees of freedom) camera pose, which is a fundamental component in various computer vision and robotic tasks. This paper presents LoGS, a vision-based localization pipeline utilizing the 3D Gaussian Splatting (GS) technique as scene representation. This novel representation allows high-quality novel view synthesis. During the mapping phase, structure-from-motion (SfM) is applied first, followed by the generation of a GS map. During localization, the initial position is obtained through image retrieval, local feature matching coupled with a PnP solver, and then a high-precision pose is achieved through the analysis-by-synthesis manner on the GS map. Experimental results on four large-scale datasets demonstrate the proposed approach's SoTA accuracy in estimating camera poses and robustness under challenging few-shot conditions.

LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images

TL;DR

LoGS is presented, a vision-based localization pipeline utilizing the 3D Gaussian Splatting technique as scene representation that allows high-quality novel view synthesis and its SoTA accuracy in estimating camera poses and robustness under challenging few-shot conditions.

Abstract

Visual localization involves estimating a query image's 6-DoF (degrees of freedom) camera pose, which is a fundamental component in various computer vision and robotic tasks. This paper presents LoGS, a vision-based localization pipeline utilizing the 3D Gaussian Splatting (GS) technique as scene representation. This novel representation allows high-quality novel view synthesis. During the mapping phase, structure-from-motion (SfM) is applied first, followed by the generation of a GS map. During localization, the initial position is obtained through image retrieval, local feature matching coupled with a PnP solver, and then a high-precision pose is achieved through the analysis-by-synthesis manner on the GS map. Experimental results on four large-scale datasets demonstrate the proposed approach's SoTA accuracy in estimating camera poses and robustness under challenging few-shot conditions.

Paper Structure

This paper contains 17 sections, 18 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: An illustration of the LoGS pipeline where the localization process aligns with the mapping.
  • Figure 2: Median error pose illustration (full-training on SfM ground truth). The bottom-left region of each plot is the original image. The upper-right part corresponds to the rendered image from Gaussian Splatting and the estimated pose. The first 7 plots are from the 7-scenes datasets and the last two are from the Cambridge Landmarks dataset.