HG3-NeRF: Hierarchical Geometric, Semantic, and Photometric Guided Neural Radiance Fields for Sparse View Inputs
Zelin Gao, Weichen Dai, Yu Zhang
TL;DR
HG3-NeRF tackles sparse-view novel view synthesis by guiding NeRF with hierarchical geometric, semantic, and photometric cues. It introduces HGG to leverage sparse depth priors from SfM through local-to-global sampling, HSG to supervise coarse-to-fine semantics via CLIP, and HPG to ensure appearance consistency across scales. The approach delivers state-of-the-art results on standard benchmarks with sparse inputs and demonstrates improved geometry and semantics in real-world space without relying on NDC representations. Ablation studies confirm the contributions of HGG and HSG, and the method reduces data requirements compared to traditional NeRF approaches.
Abstract
Neural Radiance Fields (NeRF) have garnered considerable attention as a paradigm for novel view synthesis by learning scene representations from discrete observations. Nevertheless, NeRF exhibit pronounced performance degradation when confronted with sparse view inputs, consequently curtailing its further applicability. In this work, we introduce Hierarchical Geometric, Semantic, and Photometric Guided NeRF (HG3-NeRF), a novel methodology that can address the aforementioned limitation and enhance consistency of geometry, semantic content, and appearance across different views. We propose Hierarchical Geometric Guidance (HGG) to incorporate the attachment of Structure from Motion (SfM), namely sparse depth prior, into the scene representations. Different from direct depth supervision, HGG samples volume points from local-to-global geometric regions, mitigating the misalignment caused by inherent bias in the depth prior. Furthermore, we draw inspiration from notable variations in semantic consistency observed across images of different resolutions and propose Hierarchical Semantic Guidance (HSG) to learn the coarse-to-fine semantic content, which corresponds to the coarse-to-fine scene representations. Experimental results demonstrate that HG3-NeRF can outperform other state-of-the-art methods on different standard benchmarks and achieve high-fidelity synthesis results for sparse view inputs.
