HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces
Haithem Turki, Vasu Agrawal, Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder, Deva Ramanan, Michael Zollhöfer, Christian Richardt
TL;DR
HybridNeRF tackles the slow rendering of neural radiance fields by introducing a hybrid surface–volume representation that predominantly renders scenes as surfaces while handling challenging regions volumetrically. The method replaces a global surfaceness parameter with a spatially adaptive $\\beta(\\boldsymbol{x})$, enabling most of the scene to be rendered with few samples, and uses a VolSDF-inspired density coupled with an Eikonal regularization to maintain surface quality. Finetuning includes proposal-network baking, MLP distillation, and distance-adjusted Eikonal loss, plus real-time rendering optimizations like texture-based features and sphere tracing, achieving VR-ready 2K×2K framerates and state-of-the-art quality on Eyeful Tower and other benchmarks. The results demonstrate a favorable speed–quality trade-off over existing real-time and hybrid methods, with practical implications for immersive applications while outlining memory and training-time limitations and future directions that may combine surface–volume advantages with fast splatting approaches.
Abstract
Neural radiance fields provide state-of-the-art view synthesis quality but tend to be slow to render. One reason is that they make use of volume rendering, thus requiring many samples (and model queries) per ray at render time. Although this representation is flexible and easy to optimize, most real-world objects can be modeled more efficiently with surfaces instead of volumes, requiring far fewer samples per ray. This observation has spurred considerable progress in surface representations such as signed distance functions, but these may struggle to model semi-opaque and thin structures. We propose a method, HybridNeRF, that leverages the strengths of both representations by rendering most objects as surfaces while modeling the (typically) small fraction of challenging regions volumetrically. We evaluate HybridNeRF against the challenging Eyeful Tower dataset along with other commonly used view synthesis datasets. When comparing to state-of-the-art baselines, including recent rasterization-based approaches, we improve error rates by 15-30% while achieving real-time framerates (at least 36 FPS) for virtual-reality resolutions (2Kx2K).
