GRF: Learning a General Radiance Field for 3D Representation and Rendering
Alex Trevithick, Bo Yang
TL;DR
GRF addresses the challenge of general 3D representation and novel-view synthesis from only 2D observations by learning a general radiance field that generalizes across unseen objects, categories, and scenes. It combines per-pixel 2D features with geometry-aware reprojection and attention-based aggregation, feeding aggregated 3D-point features into a NeRF-like renderer to produce high-fidelity views. The approach achieves strong generalization on ShapeNet and Synthetic-NeRF datasets, and substantially improves single-scene results on real-world LLFF/3DScan data compared with NeRF-based methods. The work also provides insights into the role of attention in resolving occlusions and view-aggregation in neural rendering.
Abstract
We present a simple yet powerful neural network that implicitly represents and renders 3D objects and scenes only from 2D observations. The network models 3D geometries as a general radiance field, which takes a set of 2D images with camera poses and intrinsics as input, constructs an internal representation for each point of the 3D space, and then renders the corresponding appearance and geometry of that point viewed from an arbitrary position. The key to our approach is to learn local features for each pixel in 2D images and to then project these features to 3D points, thus yielding general and rich point representations. We additionally integrate an attention mechanism to aggregate pixel features from multiple 2D views, such that visual occlusions are implicitly taken into account. Extensive experiments demonstrate that our method can generate high-quality and realistic novel views for novel objects, unseen categories and challenging real-world scenes.
