Table of Contents
Fetching ...

SmileSplat: Generalizable Gaussian Splats for Unconstrained Sparse Images

Yanyan Li, Yixin Fang, Federico Tombari, Gim Hee Lee

TL;DR

A novel generalizable Gaussian Splatting method is proposed to reconstruct pixel-aligned Gaussian surfels for diverse scenarios only requiring unconstrained sparse multi-view images, and achieves state-of-the-art performance in various 3D vision tasks.

Abstract

Sparse Multi-view Images can be Learned to predict explicit radiance fields via Generalizable Gaussian Splatting approaches, which can achieve wider application prospects in real-life when ground-truth camera parameters are not required as inputs. In this paper, a novel generalizable Gaussian Splatting method, SmileSplat, is proposed to reconstruct pixel-aligned Gaussian surfels for diverse scenarios only requiring unconstrained sparse multi-view images. First, Gaussian surfels are predicted based on the multi-head Gaussian regression decoder, which can are represented with less degree-of-freedom but have better multi-view consistency. Furthermore, the normal vectors of Gaussian surfel are enhanced based on high-quality of normal priors. Second, the Gaussians and camera parameters (both extrinsic and intrinsic) are optimized to obtain high-quality Gaussian radiance fields for novel view synthesis tasks based on the proposed Bundle-Adjusting Gaussian Splatting module. Extensive experiments on novel view rendering and depth map prediction tasks are conducted on public datasets, demonstrating that the proposed method achieves state-of-the-art performance in various 3D vision tasks. More information can be found on our project page (https://yanyan-li.github.io/project/gs/smilesplat)

SmileSplat: Generalizable Gaussian Splats for Unconstrained Sparse Images

TL;DR

A novel generalizable Gaussian Splatting method is proposed to reconstruct pixel-aligned Gaussian surfels for diverse scenarios only requiring unconstrained sparse multi-view images, and achieves state-of-the-art performance in various 3D vision tasks.

Abstract

Sparse Multi-view Images can be Learned to predict explicit radiance fields via Generalizable Gaussian Splatting approaches, which can achieve wider application prospects in real-life when ground-truth camera parameters are not required as inputs. In this paper, a novel generalizable Gaussian Splatting method, SmileSplat, is proposed to reconstruct pixel-aligned Gaussian surfels for diverse scenarios only requiring unconstrained sparse multi-view images. First, Gaussian surfels are predicted based on the multi-head Gaussian regression decoder, which can are represented with less degree-of-freedom but have better multi-view consistency. Furthermore, the normal vectors of Gaussian surfel are enhanced based on high-quality of normal priors. Second, the Gaussians and camera parameters (both extrinsic and intrinsic) are optimized to obtain high-quality Gaussian radiance fields for novel view synthesis tasks based on the proposed Bundle-Adjusting Gaussian Splatting module. Extensive experiments on novel view rendering and depth map prediction tasks are conducted on public datasets, demonstrating that the proposed method achieves state-of-the-art performance in various 3D vision tasks. More information can be found on our project page (https://yanyan-li.github.io/project/gs/smilesplat)

Paper Structure

This paper contains 30 sections, 28 equations, 11 figures, 4 tables, 1 algorithm.

Figures (11)

  • Figure 1: An example result of SmileSplat. It aims to render novel views and estimate camera parameters (intrinsic $\mathbf{K}$ and extrinsic $\mathbf{T}$) by using sparse views.
  • Figure 2: Architecture of SmileSplat. With sparse but overlapping views as input, the system consists of two main modules, Multi-Head Gaussian Regression and Bundle-Adjusting Gaussian Splatting, for achieving scaled Gaussian radiance fields.
  • Figure 3: Comparison of novel view rendering and depth prediction results on Re10k zhou2018stereo and ACID liu2021infinite datasets with different settings. The rendering results include the RGB in the left column and the depth in the right column. $\pm$ K and $\pm$ Pose denote whether intrinsics and extrinsics are free or not.
  • Figure 4: Comparisons of novel view rendering on Replica straub2019replica dataset. The rendering results include the RGB in the left column and the difference with the ground truth in the right column. $\pm$ K and $\pm$ Pose denote whether intrinsics and extrinsics are free or not.
  • Figure 5: 3D Perspective frustum and its different views in 2D space.
  • ...and 6 more figures