Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion

Yan Xu; Yixing Wang; Stella X. Yu

Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion

Yan Xu, Yixing Wang, Stella X. Yu

TL;DR

This work tackles sparse-input novel view synthesis by reframing it as test-time natural video completion and leveraging pretrained video diffusion priors. It introduces a zero-shot, generation-guided pipeline that generates uncertainty-aware pseudo-views between sparse inputs to supervise and densify a 3D-Gaussian Splatting representation, iteratively refining geometry and appearance. A Gaussian primitive densification step and an uncertainty-guided diffusion modulation enable robust reconstruction in under-observed regions without scene-specific training. Experiments across LLFF, DTU, DL3DV, and MipNeRF-360 demonstrate strong performance under extreme sparsity, highlighting practical impact for fast, high-fidelity view synthesis in unconstrained camera paths.

Abstract

Given just a few glimpses of a scene, can you imagine the movie playing out as the camera glides through it? That's the lens we take on \emph{sparse-input novel view synthesis}, not only as filling spatial gaps between widely spaced views, but also as \emph{completing a natural video} unfolding through space. We recast the task as \emph{test-time natural video completion}, using powerful priors from \emph{pretrained video diffusion models} to hallucinate plausible in-between views. Our \emph{zero-shot, generation-guided} framework produces pseudo views at novel camera poses, modulated by an \emph{uncertainty-aware mechanism} for spatial coherence. These synthesized frames densify supervision for \emph{3D Gaussian Splatting} (3D-GS) for scene reconstruction, especially in under-observed regions. An iterative feedback loop lets 3D geometry and 2D view synthesis inform each other, improving both the scene reconstruction and the generated views. The result is coherent, high-fidelity renderings from sparse inputs \emph{without any scene-specific training or fine-tuning}. On LLFF, DTU, DL3DV, and MipNeRF-360, our method significantly outperforms strong 3D-GS baselines under extreme sparsity.

Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion

TL;DR

Abstract

Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)