NeRF in detail: Learning to sample for view synthesis
Relja Arandjelović, Andrew Zisserman
TL;DR
This work replaces NeRF's handcrafted hierarchical sampling with a differentiable proposer that learns where to sample along rays, enabling end-to-end optimization. It evaluates multiple lightweight proposer architectures and introduces a two-stage training strategy to stabilize learning, achieving state-of-the-art results on Blender and competitive performance on LLFF-NeRF. Additionally, the model can predict sample importance to prune computations, yielding about 25% faster rendering without substantial quality loss. The proposed NeRF-ID framework is compatible with existing NeRF extensions and offers a practical path to more accurate and efficient view synthesis.
Abstract
Neural radiance fields (NeRF) methods have demonstrated impressive novel view synthesis performance. The core approach is to render individual rays by querying a neural network at points sampled along the ray to obtain the density and colour of the sampled points, and integrating this information using the rendering equation. Since dense sampling is computationally prohibitive, a common solution is to perform coarse-to-fine sampling. In this work we address a clear limitation of the vanilla coarse-to-fine approach -- that it is based on a heuristic and not trained end-to-end for the task at hand. We introduce a differentiable module that learns to propose samples and their importance for the fine network, and consider and compare multiple alternatives for its neural architecture. Training the proposal module from scratch can be unstable due to lack of supervision, so an effective pre-training strategy is also put forward. The approach, named `NeRF in detail' (NeRF-ID), achieves superior view synthesis quality over NeRF and the state-of-the-art on the synthetic Blender benchmark and on par or better performance on the real LLFF-NeRF scenes. Furthermore, by leveraging the predicted sample importance, a 25% saving in computation can be achieved without significantly sacrificing the rendering quality.
