Table of Contents
Fetching ...

JOGS: Joint Optimization of Pose Estimation and 3D Gaussian Splatting

Yuxuan Li, Tao Wang, Xianben Yang

TL;DR

The paper introduces JOGS, a COLMAP-free framework that jointly optimizes camera poses and 3D Gaussian splats for novel view synthesis. It adopts a dual-phase alternating scheme where differentiable 3D Gaussian rendering guides Gaussian updates and a LK3D-based 3D optical flow refines poses, enabling end-to-end gradient flow. Key contributions include a unified joint optimization approach without external pose priors, the LK3D pose refinement algorithm, and strong improvements in rendering quality and pose accuracy across challenging datasets. This approach yields robust performance under large viewpoint changes and sparse feature distributions, with practical impact for efficient and accurate 3D reconstruction and rendering in real-world scenarios.

Abstract

Traditional novel view synthesis methods heavily rely on external camera pose estimation tools such as COLMAP, which often introduce computational bottlenecks and propagate errors. To address these challenges, we propose a unified framework that jointly optimizes 3D Gaussian points and camera poses without requiring pre-calibrated inputs. Our approach iteratively refines 3D Gaussian parameters and updates camera poses through a novel co-optimization strategy, ensuring simultaneous improvements in scene reconstruction fidelity and pose accuracy. The key innovation lies in decoupling the joint optimization into two interleaved phases: first, updating 3D Gaussian parameters via differentiable rendering with fixed poses, and second, refining camera poses using a customized 3D optical flow algorithm that incorporates geometric and photometric constraints. This formulation progressively reduces projection errors, particularly in challenging scenarios with large viewpoint variations and sparse feature distributions, where traditional methods struggle. Extensive evaluations on multiple datasets demonstrate that our approach significantly outperforms existing COLMAP-free techniques in reconstruction quality, and also surpasses the standard COLMAP-based baseline in general.

JOGS: Joint Optimization of Pose Estimation and 3D Gaussian Splatting

TL;DR

The paper introduces JOGS, a COLMAP-free framework that jointly optimizes camera poses and 3D Gaussian splats for novel view synthesis. It adopts a dual-phase alternating scheme where differentiable 3D Gaussian rendering guides Gaussian updates and a LK3D-based 3D optical flow refines poses, enabling end-to-end gradient flow. Key contributions include a unified joint optimization approach without external pose priors, the LK3D pose refinement algorithm, and strong improvements in rendering quality and pose accuracy across challenging datasets. This approach yields robust performance under large viewpoint changes and sparse feature distributions, with practical impact for efficient and accurate 3D reconstruction and rendering in real-world scenarios.

Abstract

Traditional novel view synthesis methods heavily rely on external camera pose estimation tools such as COLMAP, which often introduce computational bottlenecks and propagate errors. To address these challenges, we propose a unified framework that jointly optimizes 3D Gaussian points and camera poses without requiring pre-calibrated inputs. Our approach iteratively refines 3D Gaussian parameters and updates camera poses through a novel co-optimization strategy, ensuring simultaneous improvements in scene reconstruction fidelity and pose accuracy. The key innovation lies in decoupling the joint optimization into two interleaved phases: first, updating 3D Gaussian parameters via differentiable rendering with fixed poses, and second, refining camera poses using a customized 3D optical flow algorithm that incorporates geometric and photometric constraints. This formulation progressively reduces projection errors, particularly in challenging scenarios with large viewpoint variations and sparse feature distributions, where traditional methods struggle. Extensive evaluations on multiple datasets demonstrate that our approach significantly outperforms existing COLMAP-free techniques in reconstruction quality, and also surpasses the standard COLMAP-based baseline in general.

Paper Structure

This paper contains 15 sections, 13 equations, 7 figures, 6 tables, 2 algorithms.

Figures (7)

  • Figure 1: Method Overview. Our JOGS framework jointly optimizes Pose Estimation and 3D Gaussian Splatting. It starts with a simple SfM initialization, then iteratively updates 3D Gaussian splatting parameters $\mathcal{G}$ and refines camera poses $\mathcal{P}$, ensuring simultaneous improvements in scene reconstruction fidelity and pose accuracy. The updating of Gaussian points follows a standard 3DGS pipeline, while the refinement of camera poses is done by the proposed LK3D algorithm.
  • Figure 2: Qualitative results of several representative samples picked from LLFF-NeRF, Tanks and Temples, Shiny. Our method achieves consistently high rendering quality across all scenes.
  • Figure 3: The comparison of all the methods in scene details. Obviously, our method is better in the detail and texture of novel view synthesis due to the addition of camera pose optimization during training.
  • Figure 4: Trajectory comparison of different methods across several scenes
  • Figure 5: Trajectory comparison on Ballroom, Barn, Church, and Family from the Tanks and Temples dataset.
  • ...and 2 more figures