COLMAP-Free 3D Gaussian Splatting

Yang Fu; Sifei Liu; Amey Kulkarni; Jan Kautz; Alexei A. Efros; Xiaolong Wang

COLMAP-Free 3D Gaussian Splatting

Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A. Efros, Xiaolong Wang

TL;DR

COLMAP-Free 3D Gaussian Splatting (CF-3DGS) removes the need for pre-computed camera poses by jointly estimating poses and reconstructing scenes from unposed video using an explicit 3D Gaussian Splatting representation. It combines a local 3DGS-based relative pose estimation with a globally progressively growing 3DGS that aggregates frames over time, rendered via a differentiable splatting pipeline where each Gaussian contributes $G(x)=e^{-rac{1}{2}(x-oldsymbol{\mu})^{\top}\boldsymbol{\Sigma}^{-1}(x-oldsymbol{\mu})}$. The method achieves robust pose estimation and superior novel-view synthesis on challenging sequences ( Tanks & Temples, CO3D-V2 ) with substantially shorter training times than pose-unknown baselines and competitive performance versus COLMAP-guided 3DGS. By exploiting the explicit geometry of Gaussians and temporal continuity, CF-3DGS enables fast, COLMAP-free scene reconstruction from unposed videos, including highly dynamic camera motions.

Abstract

While neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses. To relax this constraint, multiple efforts have been made to train Neural Radiance Fields (NeRFs) without pre-processed camera poses. However, the implicit representations of NeRFs provide extra challenges to optimize the 3D structure and camera poses at the same time. On the other hand, the recently proposed 3D Gaussian Splatting provides new opportunities given its explicit point cloud representations. This paper leverages both the explicit geometric representation and the continuity of the input video stream to perform novel view synthesis without any SfM preprocessing. We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time, without the need to pre-compute the camera poses. Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes. Our project page is https://oasisyang.github.io/colmap-free-3dgs

COLMAP-Free 3D Gaussian Splatting

TL;DR

. The method achieves robust pose estimation and superior novel-view synthesis on challenging sequences ( Tanks & Temples, CO3D-V2 ) with substantially shorter training times than pose-unknown baselines and competitive performance versus COLMAP-guided 3DGS. By exploiting the explicit geometry of Gaussians and temporal continuity, CF-3DGS enables fast, COLMAP-free scene reconstruction from unposed videos, including highly dynamic camera motions.

Abstract

Paper Structure (21 sections, 8 equations, 9 figures, 12 tables, 1 algorithm)

This paper contains 21 sections, 8 equations, 9 figures, 12 tables, 1 algorithm.

Introduction
Related Work
Method
Preliminary: 3D Gaussian Splatting
Local 3DGS for Relative Pose Estimation
Global 3DGS with Progressively Growing
Experiments
Experimental Setup
Comparing with Pose-Unknown Methods
Results on Scenes with Large Camera Motions
Ablation Study
Conclusion
Limitations.
Acknowledgements.
Implementation Details
...and 6 more sections

Figures (9)

Figure 1: Novel View Synthesis and Camera Pose Estimation Comparison. We propose COLMAP-Free 3D Gaussian Splatting (CF-3DGS) for novel view synthesis without known camera parameters. Our method achieves more robustness in pose estimation and better quality in novel view synthesis than previous state-of-the-art methods.
Figure 2: Overview of proposed CF-3DGS. Our method takes a sequence of images as input to learn a set of 3D Gaussian that presents the input scene and jointly estimates the camera poses of the frames. We first introduce a local 3DGS to estimate the relative pose of two nearby frames by approximating the Gaussian transformation. Then, a global 3DGS is utilized to model the scene by progressively growing the set of 3D Gaussian as the camera moves.
Figure 3: Qualitative comparison for novel view synthesis on Tanks and Temples. Our approach produces more realistic rendering results than other baselines. Better viewed when zoomed in.
Figure 4: Qualitative comparison for novel view synthesis and camera pose estimation on CO3D V2. Our approach estimates camera pose much more robust than Nope-NeRF, and thus generates higher quality rendering images. Better viewed when zoomed in.
Figure 5: Qualitative comparison for Camera Pose Estimation on CO3D-V2. The ground-truth trajectory and the estimated one are shown in blue and red, respectively.
...and 4 more figures

COLMAP-Free 3D Gaussian Splatting

TL;DR

Abstract

COLMAP-Free 3D Gaussian Splatting

Authors

TL;DR

Abstract

Table of Contents

Figures (9)