Table of Contents
Fetching ...

Gesplat: Robust Pose-Free 3D Reconstruction via Geometry-Guided Gaussian Splatting

Jiahui Lu, Haihong Xiao, Xueyan Zhao, Wenxiong Kang

TL;DR

Gesplat addresses robust sparse-view pose-free 3D reconstruction and novel view synthesis by combining VGGT-based initialization with a hybrid Gaussian representation bound to matching rays. The method introduces graph-guided optimization and flow-based depth regularization to enforce multi-view consistency and improve depth supervision during training, enabling accurate geometry and rendering from limited unposed inputs. Across LLFF and Tanks & Temples, Gesplat achieves state-of-the-art performance against pose-free baselines, with clear gains in PSNR, SSIM, and LPIPS and improved geometric detail. This approach expands practical applicability of fast 3D reconstruction and NVS in real-world sparse-view scenarios by delivering stable optimization, finer details, and robust pose estimation.

Abstract

Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have advanced 3D reconstruction and novel view synthesis, but remain heavily dependent on accurate camera poses and dense viewpoint coverage. These requirements limit their applicability in sparse-view settings, where pose estimation becomes unreliable and supervision is insufficient. To overcome these challenges, we introduce Gesplat, a 3DGS-based framework that enables robust novel view synthesis and geometrically consistent reconstruction from unposed sparse images. Unlike prior works that rely on COLMAP for sparse point cloud initialization, we leverage the VGGT foundation model to obtain more reliable initial poses and dense point clouds. Our approach integrates several key innovations: 1) a hybrid Gaussian representation with dual position-shape optimization enhanced by inter-view matching consistency; 2) a graph-guided attribute refinement module to enhance scene details; and 3) flow-based depth regularization that improves depth estimation accuracy for more effective supervision. Comprehensive quantitative and qualitative experiments demonstrate that our approach achieves more robust performance on both forward-facing and large-scale complex datasets compared to other pose-free methods.

Gesplat: Robust Pose-Free 3D Reconstruction via Geometry-Guided Gaussian Splatting

TL;DR

Gesplat addresses robust sparse-view pose-free 3D reconstruction and novel view synthesis by combining VGGT-based initialization with a hybrid Gaussian representation bound to matching rays. The method introduces graph-guided optimization and flow-based depth regularization to enforce multi-view consistency and improve depth supervision during training, enabling accurate geometry and rendering from limited unposed inputs. Across LLFF and Tanks & Temples, Gesplat achieves state-of-the-art performance against pose-free baselines, with clear gains in PSNR, SSIM, and LPIPS and improved geometric detail. This approach expands practical applicability of fast 3D reconstruction and NVS in real-world sparse-view scenarios by delivering stable optimization, finer details, and robust pose estimation.

Abstract

Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have advanced 3D reconstruction and novel view synthesis, but remain heavily dependent on accurate camera poses and dense viewpoint coverage. These requirements limit their applicability in sparse-view settings, where pose estimation becomes unreliable and supervision is insufficient. To overcome these challenges, we introduce Gesplat, a 3DGS-based framework that enables robust novel view synthesis and geometrically consistent reconstruction from unposed sparse images. Unlike prior works that rely on COLMAP for sparse point cloud initialization, we leverage the VGGT foundation model to obtain more reliable initial poses and dense point clouds. Our approach integrates several key innovations: 1) a hybrid Gaussian representation with dual position-shape optimization enhanced by inter-view matching consistency; 2) a graph-guided attribute refinement module to enhance scene details; and 3) flow-based depth regularization that improves depth estimation accuracy for more effective supervision. Comprehensive quantitative and qualitative experiments demonstrate that our approach achieves more robust performance on both forward-facing and large-scale complex datasets compared to other pose-free methods.

Paper Structure

This paper contains 38 sections, 26 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Novel View Synthesis Comparisons.We introduce Gesplat, an efficient framework for novel view synthesis and 3D reconstruction from sparse-view unposed inputs.Compared with other pose-free methods, our method achieves higher PSNR and reconstructs more consistent geometry with finer details.
  • Figure 2: Overall framework of Gesplat. Given a few input images, we first generate dense point clouds and camera poses from VGGT, extract matching priors and predict optical flow. Subsequently, we randomly initialize hybrid gaussian representation, using ray-based Gaussians to optimize the gaussian position. Graph-guided optimization and joint optimization are then applied to refine the Gaussians and camera poses. Finally, we employ flow-estimated depth and matching depth as rendering geometry regularization.
  • Figure 3: Qualitative comparison between Gesplat and various pose-free methods on LLFF datasets(6 views).The reconstruction of our method is more accurate and exhibits finer details compared with other competitors.
  • Figure 4: Qualitative comparison on TNT(3,6,9 views).The reconstruction of our method is more accurate and exhibits finer details.
  • Figure 5: Ablation study on the key modules of our model.