Table of Contents
Fetching ...

Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization

Christian Schmidt, Jens Piekenbrinck, Bastian Leibe

TL;DR

This work proposes an extension to the 3D Gaussian Splatting framework by optimizing the extrinsic camera parameters with respect to photometric residuals, and derives the analytical gradients and integrates their computation with the existing high-performance CUDA implementation to enable fast reconstruction of 3D scenes without requiring accurate pose information.

Abstract

3D Gaussian Splatting has recently emerged as a powerful tool for fast and accurate novel-view synthesis from a set of posed input images. However, like most novel-view synthesis approaches, it relies on accurate camera pose information, limiting its applicability in real-world scenarios where acquiring accurate camera poses can be challenging or even impossible. We propose an extension to the 3D Gaussian Splatting framework by optimizing the extrinsic camera parameters with respect to photometric residuals. We derive the analytical gradients and integrate their computation with the existing high-performance CUDA implementation. This enables downstream tasks such as 6-DoF camera pose estimation as well as joint reconstruction and camera refinement. In particular, we achieve rapid convergence and high accuracy for pose estimation on real-world scenes. Our method enables fast reconstruction of 3D scenes without requiring accurate pose information by jointly optimizing geometry and camera poses, while achieving state-of-the-art results in novel-view synthesis. Our approach is considerably faster to optimize than most competing methods, and several times faster in rendering. We show results on real-world scenes and complex trajectories through simulated environments, achieving state-of-the-art results on LLFF while reducing runtime by two to four times compared to the most efficient competing method. Source code will be available at https://github.com/Schmiddo/noposegs .

Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization

TL;DR

This work proposes an extension to the 3D Gaussian Splatting framework by optimizing the extrinsic camera parameters with respect to photometric residuals, and derives the analytical gradients and integrates their computation with the existing high-performance CUDA implementation to enable fast reconstruction of 3D scenes without requiring accurate pose information.

Abstract

3D Gaussian Splatting has recently emerged as a powerful tool for fast and accurate novel-view synthesis from a set of posed input images. However, like most novel-view synthesis approaches, it relies on accurate camera pose information, limiting its applicability in real-world scenarios where acquiring accurate camera poses can be challenging or even impossible. We propose an extension to the 3D Gaussian Splatting framework by optimizing the extrinsic camera parameters with respect to photometric residuals. We derive the analytical gradients and integrate their computation with the existing high-performance CUDA implementation. This enables downstream tasks such as 6-DoF camera pose estimation as well as joint reconstruction and camera refinement. In particular, we achieve rapid convergence and high accuracy for pose estimation on real-world scenes. Our method enables fast reconstruction of 3D scenes without requiring accurate pose information by jointly optimizing geometry and camera poses, while achieving state-of-the-art results in novel-view synthesis. Our approach is considerably faster to optimize than most competing methods, and several times faster in rendering. We show results on real-world scenes and complex trajectories through simulated environments, achieving state-of-the-art results on LLFF while reducing runtime by two to four times compared to the most efficient competing method. Source code will be available at https://github.com/Schmiddo/noposegs .

Paper Structure

This paper contains 14 sections, 19 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Our proposed approach enables fast, Gaussian Splatting based 3D reconstruction and photo-realistic novel-view synthesis while simultaneously estimating and refining the camera poses, as visualized on this reconstruction of the horns scene from LLFFmildenhall2019llff. This reconstruction was performed without camera pose information.
  • Figure 2: Qualitative results for pose-free reconstruction and NVS on the LLFF dataset mildenhall2019llff. Our method achieves superior reconstruction quality compared to previous techniques. Notably, BARF, GARF and MRHE struggle to capture fine details, as exemplified by the banister in the trex scene. Additionally, their rendered images often exhibit blurriness. While JRTF appears sharper at first glance, it suffers from significant pixelation artifacts, particularly evident in the leaves of the flower scene. In contrast, our method produces crisp and artifact-free renderings.
  • Figure 3: Convergence behavior of our method compared to piNeRF lin2023icra:pnerf on a subset of the LLFF dataset, i.e. fern, flower, fortress, horns and room. The dashed black line represents the thresholds of 5deg for rotation and 0.05 for translation. The colored bold line is the mean of the error over all scenes. Moreover, the +/- std is shown as the colored area around the mean line. Our method uses only a single pose hypothesis, but converges much faster than even the multi-hypothesis baseline.
  • Figure 4: Qualitative results on office0 from Replicareplica19arxiv. Left: trajectory at the beginning of optimization. Right: trajectory after optimization. Processing takes around 30 minutes on a single RTX 3090.