Table of Contents
Fetching ...

EVPGS: Enhanced View Prior Guidance for Splatting-based Extrapolated View Synthesis

Jiahe Li, Feiyu Wang, Xiaochao Qu, Chengjing Wu, Luoqi Liu, Ting Liu

TL;DR

This work tackles Extrapolated View Synthesis (EVS) for Gaussian Splatting by introducing EVPGS, a coarse-to-fine framework that first regularizes augmented views with Appearance and Geometry Regularization (AGR) and then generates Enhanced View Priors via Occlusion-Aware Reprojection and Refinement (OARR) to guide fine-tuning. By leveraging a pre-trained diffusion prior and mesh-based depth guidance, EVPGS produces artifact-free extrapolations with realistic appearance and fine details across real and synthetic datasets, including a new Merchandise3D EVS dataset. The approach yields state-of-the-art quantitative performance (PSNR, SSIM, LPIPS) and strong qualitative gains, while remaining compatible with multiple GS backbones. The work provides a practical, scalable solution for EVS and facilitates real-world applications like merchandise visualization, with public release of code, dataset, and models.

Abstract

Gaussian Splatting (GS)-based methods rely on sufficient training view coverage and perform synthesis on interpolated views. In this work, we tackle the more challenging and underexplored Extrapolated View Synthesis (EVS) task. Here we enable GS-based models trained with limited view coverage to generalize well to extrapolated views. To achieve our goal, we propose a view augmentation framework to guide training through a coarse-to-fine process. At the coarse stage, we reduce rendering artifacts due to insufficient view coverage by introducing a regularization strategy at both appearance and geometry levels. At the fine stage, we generate reliable view priors to provide further training guidance. To this end, we incorporate an occlusion awareness into the view prior generation process, and refine the view priors with the aid of coarse stage output. We call our framework Enhanced View Prior Guidance for Splatting (EVPGS). To comprehensively evaluate EVPGS on the EVS task, we collect a real-world dataset called Merchandise3D dedicated to the EVS scenario. Experiments on three datasets including both real and synthetic demonstrate EVPGS achieves state-of-the-art performance, while improving synthesis quality at extrapolated views for GS-based methods both qualitatively and quantitatively. We will make our code, dataset, and models public.

EVPGS: Enhanced View Prior Guidance for Splatting-based Extrapolated View Synthesis

TL;DR

This work tackles Extrapolated View Synthesis (EVS) for Gaussian Splatting by introducing EVPGS, a coarse-to-fine framework that first regularizes augmented views with Appearance and Geometry Regularization (AGR) and then generates Enhanced View Priors via Occlusion-Aware Reprojection and Refinement (OARR) to guide fine-tuning. By leveraging a pre-trained diffusion prior and mesh-based depth guidance, EVPGS produces artifact-free extrapolations with realistic appearance and fine details across real and synthetic datasets, including a new Merchandise3D EVS dataset. The approach yields state-of-the-art quantitative performance (PSNR, SSIM, LPIPS) and strong qualitative gains, while remaining compatible with multiple GS backbones. The work provides a practical, scalable solution for EVS and facilitates real-world applications like merchandise visualization, with public release of code, dataset, and models.

Abstract

Gaussian Splatting (GS)-based methods rely on sufficient training view coverage and perform synthesis on interpolated views. In this work, we tackle the more challenging and underexplored Extrapolated View Synthesis (EVS) task. Here we enable GS-based models trained with limited view coverage to generalize well to extrapolated views. To achieve our goal, we propose a view augmentation framework to guide training through a coarse-to-fine process. At the coarse stage, we reduce rendering artifacts due to insufficient view coverage by introducing a regularization strategy at both appearance and geometry levels. At the fine stage, we generate reliable view priors to provide further training guidance. To this end, we incorporate an occlusion awareness into the view prior generation process, and refine the view priors with the aid of coarse stage output. We call our framework Enhanced View Prior Guidance for Splatting (EVPGS). To comprehensively evaluate EVPGS on the EVS task, we collect a real-world dataset called Merchandise3D dedicated to the EVS scenario. Experiments on three datasets including both real and synthetic demonstrate EVPGS achieves state-of-the-art performance, while improving synthesis quality at extrapolated views for GS-based methods both qualitatively and quantitatively. We will make our code, dataset, and models public.

Paper Structure

This paper contains 26 sections, 10 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Illustration of the EVS problem. We visualize the camera pose at each view as a rectangle warped by the pose. Left: In conventional novel view synthesis, the training views sufficiently cover the scene. Right: In EVS, the training views have limited scene coverage (only horizontal and near-horizontal views as per the example) and overlap poorly with the testing views. Example scene from DTU jensen2014large.
  • Figure 2: Illustration of the standard reprojection process and its challenges.Left: In the standard reprojection process, each pixel in the view prior of an augmented view retrieves color from the corresponding pixel in the nearest training view, using reprojection via the reconstructed mesh. Right: This technique risks losing view-dependent color information for the augmented views and may introduce occlusion artifacts, leading to corrupted view priors.
  • Figure 3: Framework Overview. In EVPGS, we first pre-train a GS model using the training set with limited view coverage (e.g. only horizontal views), then fine-tune the GS model at augmented views (e.g. obtained by elevating the training views) through a coarse-to-fine process. At the coarse stage, we propose the Appearance and Geometry Regularization (AGR) strategy, where we use the pre-trained GS model to produce synthesis at augmented views and reduce the artifacts at these views by leveraging the Denoising Diffusion Model rombach2022highresolutionimagesynthesislatent. We additionally generate depth maps rasterized from reconstructed mesh to supervise the depth maps directly rendered from the GS model. At the fine stage, we produce Enhanced View Priors at these augmented views as pseudo-labels, via our Occlusion-Aware Reprojection and Refinement (OARR) strategy. The OARR strategy comprises both the occlusion-aware reprojection technique to eliminate occlusion corruptions, and the view prior refinement strategy to incorporate the view-dependent colors obtained from the coarse stage.
  • Figure 4: Qualitative results on the real-world datasets DTU jensen2014large (top two rows) and our Merchandise3D (bottom two rows). On both datasets, the baseline methods (the first and the third rows) exhibit rendering artifacts and lack of details due to limited training view coverage, while all five variants of EVPGS can effectively deal with these issues.
  • Figure 5: Qualitative results on the synthetic dataset Synthetic-NeRF mildenhall2020nerfrepresentingscenesneural. When using different backbones, our EVPGS produces synthesis closer to the ground truth than the baselines at extrapolated views.
  • ...and 11 more figures