Table of Contents
Fetching ...

Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction

Xinhang Liu, Jiaben Chen, Shiu-hong Kao, Yu-Wing Tai, Chi-Keung Tang

TL;DR

Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) require dense views to avoid artifacts in novel-view synthesis. The authors introduce Deceptive-NeRF/3DGS, which densifies sparse inputs by generating photorealistic pseudo-observations with a deception diffusion model conditioned on coarse renders, depth, and view-consistency uncertainty, and then trains the 3D representations using both real and pseudo views. A key component is the uncertainty-guided diffusion generation, enabling five-to-tenfold observation densification and enabling super-resolution of novel views. Across diverse datasets, the approach surpasses state-of-the-art few-view methods and significantly reduces training cost by avoiding per-step diffusion inference, making high-quality sparse-view synthesis more practical.

Abstract

Novel view synthesis via Neural Radiance Fields (NeRFs) or 3D Gaussian Splatting (3DGS) typically necessitates dense observations with hundreds of input images to circumvent artifacts. We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images, by leveraging a diffusion model pre-trained from multiview datasets. Different from using diffusion priors to regularize representation optimization, our method directly uses diffusion-generated images to train NeRF/3DGS as if they were real input views. Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality photorealistic pseudo-observations. To resolve consistency among pseudo-observations and real input views, we develop an uncertainty measure to guide the diffusion model's generation. Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times. Extensive experiments across diverse and challenging datasets validate that our approach outperforms existing state-of-the-art methods and is capable of synthesizing novel views with super-resolution in the few-view setting.

Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction

TL;DR

Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) require dense views to avoid artifacts in novel-view synthesis. The authors introduce Deceptive-NeRF/3DGS, which densifies sparse inputs by generating photorealistic pseudo-observations with a deception diffusion model conditioned on coarse renders, depth, and view-consistency uncertainty, and then trains the 3D representations using both real and pseudo views. A key component is the uncertainty-guided diffusion generation, enabling five-to-tenfold observation densification and enabling super-resolution of novel views. Across diverse datasets, the approach surpasses state-of-the-art few-view methods and significantly reduces training cost by avoiding per-step diffusion inference, making high-quality sparse-view synthesis more practical.

Abstract

Novel view synthesis via Neural Radiance Fields (NeRFs) or 3D Gaussian Splatting (3DGS) typically necessitates dense observations with hundreds of input images to circumvent artifacts. We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images, by leveraging a diffusion model pre-trained from multiview datasets. Different from using diffusion priors to regularize representation optimization, our method directly uses diffusion-generated images to train NeRF/3DGS as if they were real input views. Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality photorealistic pseudo-observations. To resolve consistency among pseudo-observations and real input views, we develop an uncertainty measure to guide the diffusion model's generation. Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times. Extensive experiments across diverse and challenging datasets validate that our approach outperforms existing state-of-the-art methods and is capable of synthesizing novel views with super-resolution in the few-view setting.
Paper Structure (16 sections, 11 equations, 12 figures, 6 tables)

This paper contains 16 sections, 11 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Different approaches to applying 2D diffusion priors in few-view 3D reconstruction.(a) With only a few input images , an intuition is to utilize the 2D diffusion model as a "scorer" for synthesized novel views , regularizing NeRF/3DGS training. (b) Instead, our approach densifies input views by generating dense pseudo-observations that are consistent with the given inputs to progressively enhance reconstruction as shown. This approach avoids the need to infer the diffusion model at every training step, thereby offering the advantage of nearly tenfold faster training speed.
  • Figure 2: Overview of Deceptive-NeRF/3DGS. 1) Given a sparse set of input images with camera poses, we train NeRF/3DGS to render coarse novel view images and depth maps. 2) Our deceptive diffusion model enhances RGB-D images from coarse reconstruction, along with a novel uncertainty measure, to generate pseudo-observations from corresponding viewpoints. 3) We continue training NeRF/3DGS using both input images (real) and pseudo-observations (fake) and repeat the aforementioned process to get our final reconstruction.
  • Figure 3: Qualitative comparisons of few view reconstruction. Scenes are from the Hypersim roberts2021hypersim, Scannet scannet and mip-NeRF 360 barron2022mipnerf360 datasets, with 10 input views.
  • Figure 4: Super-Resolution Capabilities. Our Deceptive-NeRF, when applied to the Hypersim and LLFF datasets with input images downsampled by a factor of 4, demonstrates exceptional super-resolution performance, capable of recovering intricate details.
  • Figure 5: Evaluation of Densification scales. We run our method with observations densified by varying scales on Hypersim roberts2021hypersim with 10 input views. Vanilla NeRF or 3DGS results in severe artifacts; fivefold densification reduces them but lacks detail restoration, whereas tenfold densification achieves high-quality reconstructions.
  • ...and 7 more figures