Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction
Xinhang Liu, Jiaben Chen, Shiu-hong Kao, Yu-Wing Tai, Chi-Keung Tang
TL;DR
Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) require dense views to avoid artifacts in novel-view synthesis. The authors introduce Deceptive-NeRF/3DGS, which densifies sparse inputs by generating photorealistic pseudo-observations with a deception diffusion model conditioned on coarse renders, depth, and view-consistency uncertainty, and then trains the 3D representations using both real and pseudo views. A key component is the uncertainty-guided diffusion generation, enabling five-to-tenfold observation densification and enabling super-resolution of novel views. Across diverse datasets, the approach surpasses state-of-the-art few-view methods and significantly reduces training cost by avoiding per-step diffusion inference, making high-quality sparse-view synthesis more practical.
Abstract
Novel view synthesis via Neural Radiance Fields (NeRFs) or 3D Gaussian Splatting (3DGS) typically necessitates dense observations with hundreds of input images to circumvent artifacts. We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images, by leveraging a diffusion model pre-trained from multiview datasets. Different from using diffusion priors to regularize representation optimization, our method directly uses diffusion-generated images to train NeRF/3DGS as if they were real input views. Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality photorealistic pseudo-observations. To resolve consistency among pseudo-observations and real input views, we develop an uncertainty measure to guide the diffusion model's generation. Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times. Extensive experiments across diverse and challenging datasets validate that our approach outperforms existing state-of-the-art methods and is capable of synthesizing novel views with super-resolution in the few-view setting.
