NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion
Jiatao Gu, Alex Trevithick, Kai-En Lin, Josh Susskind, Christian Theobalt, Lingjie Liu, Ravi Ramamoorthi
TL;DR
NerfDiff tackles single-image novel view synthesis by marrying a camera-space NeRF with a 3D-aware diffusion model. It jointly trains these components and then performs test-time NeRF-guided distillation to generate and refine a set of virtual views, enforcing 3D consistency through diffusion guidance. The NeRF is conditioned by a local triplane representation for efficient rendering, and a 3D-aware CDM refines renderings to reveal occluded details; NeRF-guided distillation (NGD) alternates NeRF updates with diffusion steps to maximize agreement with multi-view denoised targets. Experimental results on ShapeNet, ABO, and Clevr3D show state-of-the-art quantitative and qualitative performance, with notable improvements in sharpness behind occlusions and improved multi-view consistency.
Abstract
Novel view synthesis from a single image requires inferring occluded regions of objects and scenes whilst simultaneously maintaining semantic and physical consistency with the input. Existing approaches condition neural radiance fields (NeRF) on local image features, projecting points to the input image plane, and aggregating 2D features to perform volume rendering. However, under severe occlusion, this projection fails to resolve uncertainty, resulting in blurry renderings that lack details. In this work, we propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test time. We further propose a novel NeRF-guided distillation algorithm that simultaneously generates 3D consistent virtual views from the CDM samples, and finetunes the NeRF based on the improved virtual views. Our approach significantly outperforms existing NeRF-based and geometry-free approaches on challenging datasets, including ShapeNet, ABO, and Clevr3D.
