G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images
Zixiong Huang, Qi Chen, Libo Sun, Yifan Yang, Naizhou Wang, Mingkui Tan, Qi Wu
TL;DR
G-NeRF introduces a geometry-driven pipeline for single-shot novel view synthesis by leveraging geometry priors from an off-the-shelf 3D GAN (EG3D) through Geometry-guided Multi-View Synthesis (GMVS) and enforcing depth-aware learning via a depth-aware discriminator (DaT). A truncation-based sampling strategy balances identity diversity and geometric fidelity in synthetic multi-view data, while a depth-guided adversarial objective provides depth-consistent supervision for real-world single-view inputs. Experiments on FFHQ, AFHQv2-Cats, and CelebAMask-HQ demonstrate improved depth accuracy, identity preservation, and view-consistency over single-view baselines such as Pix2NeRF, with favorable inference speed compared to GAN-inversion methods. The approach achieves high-fidelity 3D-consistent renderings without test-time optimization and highlights the practical potential of combining 3D GAN priors with NeRF-based rendering for scalable, single-view 3D synthesis.
Abstract
Novel view synthesis aims to generate new view images of a given view image collection. Recent attempts address this problem relying on 3D geometry priors (e.g., shapes, sizes, and positions) learned from multi-view images. However, such methods encounter the following limitations: 1) they require a set of multi-view images as training data for a specific scene (e.g., face, car or chair), which is often unavailable in many real-world scenarios; 2) they fail to extract the geometry priors from single-view images due to the lack of multi-view supervision. In this paper, we propose a Geometry-enhanced NeRF (G-NeRF), which seeks to enhance the geometry priors by a geometry-guided multi-view synthesis approach, followed by a depth-aware training. In the synthesis process, inspired that existing 3D GAN models can unconditionally synthesize high-fidelity multi-view images, we seek to adopt off-the-shelf 3D GAN models, such as EG3D, as a free source to provide geometry priors through synthesizing multi-view data. Simultaneously, to further improve the geometry quality of the synthetic data, we introduce a truncation method to effectively sample latent codes within 3D GAN models. To tackle the absence of multi-view supervision for single-view images, we design the depth-aware training approach, incorporating a depth-aware discriminator to guide geometry priors through depth maps. Experiments demonstrate the effectiveness of our method in terms of both qualitative and quantitative results.
