Table of Contents
Fetching ...

MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views

Wangze Xu, Huachen Gao, Shihe Shen, Rui Peng, Jianbo Jiao, Ronggang Wang

TL;DR

This paper proposes a few-shot NVS method that excavates the multi-view priors based on 3D Gaussian Splatting and introduces a view-consistent geometry constraint for Gaussian parameters to facilitate proper optimization convergence and utilize a monocular depth regularization as compensation.

Abstract

Recently, the Neural Radiance Field (NeRF) advancement has facilitated few-shot Novel View Synthesis (NVS), which is a significant challenge in 3D vision applications. Despite numerous attempts to reduce the dense input requirement in NeRF, it still suffers from time-consumed training and rendering processes. More recently, 3D Gaussian Splatting (3DGS) achieves real-time high-quality rendering with an explicit point-based representation. However, similar to NeRF, it tends to overfit the train views for lack of constraints. In this paper, we propose \textbf{MVPGS}, a few-shot NVS method that excavates the multi-view priors based on 3D Gaussian Splatting. We leverage the recent learning-based Multi-view Stereo (MVS) to enhance the quality of geometric initialization for 3DGS. To mitigate overfitting, we propose a forward-warping method for additional appearance constraints conforming to scenes based on the computed geometry. Furthermore, we introduce a view-consistent geometry constraint for Gaussian parameters to facilitate proper optimization convergence and utilize a monocular depth regularization as compensation. Experiments show that the proposed method achieves state-of-the-art performance with real-time rendering speed. Project page: https://zezeaaa.github.io/projects/MVPGS/

MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views

TL;DR

This paper proposes a few-shot NVS method that excavates the multi-view priors based on 3D Gaussian Splatting and introduces a view-consistent geometry constraint for Gaussian parameters to facilitate proper optimization convergence and utilize a monocular depth regularization as compensation.

Abstract

Recently, the Neural Radiance Field (NeRF) advancement has facilitated few-shot Novel View Synthesis (NVS), which is a significant challenge in 3D vision applications. Despite numerous attempts to reduce the dense input requirement in NeRF, it still suffers from time-consumed training and rendering processes. More recently, 3D Gaussian Splatting (3DGS) achieves real-time high-quality rendering with an explicit point-based representation. However, similar to NeRF, it tends to overfit the train views for lack of constraints. In this paper, we propose \textbf{MVPGS}, a few-shot NVS method that excavates the multi-view priors based on 3D Gaussian Splatting. We leverage the recent learning-based Multi-view Stereo (MVS) to enhance the quality of geometric initialization for 3DGS. To mitigate overfitting, we propose a forward-warping method for additional appearance constraints conforming to scenes based on the computed geometry. Furthermore, we introduce a view-consistent geometry constraint for Gaussian parameters to facilitate proper optimization convergence and utilize a monocular depth regularization as compensation. Experiments show that the proposed method achieves state-of-the-art performance with real-time rendering speed. Project page: https://zezeaaa.github.io/projects/MVPGS/
Paper Structure (29 sections, 14 equations, 15 figures, 6 tables, 1 algorithm)

This paper contains 29 sections, 14 equations, 15 figures, 6 tables, 1 algorithm.

Figures (15)

  • Figure 1: Qualitative Results on LLFF and DTU Datasets under High-Resolution Setting with 3-view Inputs. Compared with NeRF, the proposed method maintains high-fidelity quality in high-frequency regions and meanwhile achieves competitive training and rendering speed for few-shot NVS.
  • Figure 2: Framework Overview. MVPGS leverages learning-based MVS to estimate dense view-consistent depth $D^{mvs}$ and construct a point cloud $\mathcal{P}$ for the initialization of Gaussians $\mathcal{G}$. We excavate the computed geometry from MVS through forward warping to generate appearance priors for the supervision of unseen views. To regularize the geometry update during optimization, we introduce $L_{CS}$ from MVS depth and $L_{mono}$ from monocular depth priors to guide Gaussians to converge to proper positions.
  • Figure 3: Forward Warping and Backward Warping. Forward warping (a) takes $(I_{src},D_{src}^{mvs},P_{src},P_{tgt})$ as input and output the appearance $I_{tgt}$ in pose $P_{tgt}$ based on the known geometry $D_{src}^{mvs}$. Owning to the warped location $p_{t}$ is not religiously lie in the center of the grid, we adopt reversed bilinear samplingforward_backward_warp_optical_flow to determine the color of each pixel. This is different from backward warping (b) which uses bilinear sampling to sample color from the source view according to the target view's geometry.
  • Figure 4: Qualitative Results on DTU with 3 Input Views.
  • Figure 5: Qualitative Results on LLFF with 3 Input Views.
  • ...and 10 more figures