Learning Robust Generalizable Radiance Field with Visibility and Feature Augmented Point Representation
Jiaxu Wang, Ziyi Zhang, Renjing Xu
TL;DR
This work addresses the limitations of existing generalizable NeRFs by introducing a point-based Generalizable neural Point Field (GPF) that explicitly models visibilities with geometric priors and augments them with neural features. It introduces a density-guided nonuniform log sampling strategy and a feature-augmented learnable kernel to robustly aggregate features, along with a three-stage hierarchical finetuning procedure that enables generalization without per-scene optimization. Across NeRF Synthetic, DTU, and BlendedMVS datasets, the method achieves superior geometry, view consistency, and rendering quality under both generalization and finetuning settings, surpassing image-based baselines and other point-based approaches. The approach also enables interactive manipulation of the neural point field, highlighting a practical and flexible direction for generalizable NeRFs and neural rendering at large.
Abstract
This paper introduces a novel paradigm for the generalizable neural radiance field (NeRF). Previous generic NeRF methods combine multiview stereo techniques with image-based neural rendering for generalization, yielding impressive results, while suffering from three issues. First, occlusions often result in inconsistent feature matching. Then, they deliver distortions and artifacts in geometric discontinuities and locally sharp shapes due to their individual process of sampled points and rough feature aggregation. Third, their image-based representations experience severe degradations when source views are not near enough to the target view. To address challenges, we propose the first paradigm that constructs the generalizable neural field based on point-based rather than image-based rendering, which we call the Generalizable neural Point Field (GPF). Our approach explicitly models visibilities by geometric priors and augments them with neural features. We propose a novel nonuniform log sampling strategy to improve both rendering speed and reconstruction quality. Moreover, we present a learnable kernel spatially augmented with features for feature aggregations, mitigating distortions at places with drastically varying geometries. Besides, our representation can be easily manipulated. Experiments show that our model can deliver better geometries, view consistencies, and rendering quality than all counterparts and benchmarks on three datasets in both generalization and finetuning settings, preliminarily proving the potential of the new paradigm for generalizable NeRF.
