Table of Contents
Fetching ...

HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces

Haithem Turki, Vasu Agrawal, Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder, Deva Ramanan, Michael Zollhöfer, Christian Richardt

TL;DR

HybridNeRF tackles the slow rendering of neural radiance fields by introducing a hybrid surface–volume representation that predominantly renders scenes as surfaces while handling challenging regions volumetrically. The method replaces a global surfaceness parameter with a spatially adaptive $\\beta(\\boldsymbol{x})$, enabling most of the scene to be rendered with few samples, and uses a VolSDF-inspired density coupled with an Eikonal regularization to maintain surface quality. Finetuning includes proposal-network baking, MLP distillation, and distance-adjusted Eikonal loss, plus real-time rendering optimizations like texture-based features and sphere tracing, achieving VR-ready 2K×2K framerates and state-of-the-art quality on Eyeful Tower and other benchmarks. The results demonstrate a favorable speed–quality trade-off over existing real-time and hybrid methods, with practical implications for immersive applications while outlining memory and training-time limitations and future directions that may combine surface–volume advantages with fast splatting approaches.

Abstract

Neural radiance fields provide state-of-the-art view synthesis quality but tend to be slow to render. One reason is that they make use of volume rendering, thus requiring many samples (and model queries) per ray at render time. Although this representation is flexible and easy to optimize, most real-world objects can be modeled more efficiently with surfaces instead of volumes, requiring far fewer samples per ray. This observation has spurred considerable progress in surface representations such as signed distance functions, but these may struggle to model semi-opaque and thin structures. We propose a method, HybridNeRF, that leverages the strengths of both representations by rendering most objects as surfaces while modeling the (typically) small fraction of challenging regions volumetrically. We evaluate HybridNeRF against the challenging Eyeful Tower dataset along with other commonly used view synthesis datasets. When comparing to state-of-the-art baselines, including recent rasterization-based approaches, we improve error rates by 15-30% while achieving real-time framerates (at least 36 FPS) for virtual-reality resolutions (2Kx2K).

HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces

TL;DR

HybridNeRF tackles the slow rendering of neural radiance fields by introducing a hybrid surface–volume representation that predominantly renders scenes as surfaces while handling challenging regions volumetrically. The method replaces a global surfaceness parameter with a spatially adaptive , enabling most of the scene to be rendered with few samples, and uses a VolSDF-inspired density coupled with an Eikonal regularization to maintain surface quality. Finetuning includes proposal-network baking, MLP distillation, and distance-adjusted Eikonal loss, plus real-time rendering optimizations like texture-based features and sphere tracing, achieving VR-ready 2K×2K framerates and state-of-the-art quality on Eyeful Tower and other benchmarks. The results demonstrate a favorable speed–quality trade-off over existing real-time and hybrid methods, with practical implications for immersive applications while outlining memory and training-time limitations and future directions that may combine surface–volume advantages with fast splatting approaches.

Abstract

Neural radiance fields provide state-of-the-art view synthesis quality but tend to be slow to render. One reason is that they make use of volume rendering, thus requiring many samples (and model queries) per ray at render time. Although this representation is flexible and easy to optimize, most real-world objects can be modeled more efficiently with surfaces instead of volumes, requiring far fewer samples per ray. This observation has spurred considerable progress in surface representations such as signed distance functions, but these may struggle to model semi-opaque and thin structures. We propose a method, HybridNeRF, that leverages the strengths of both representations by rendering most objects as surfaces while modeling the (typically) small fraction of challenging regions volumetrically. We evaluate HybridNeRF against the challenging Eyeful Tower dataset along with other commonly used view synthesis datasets. When comparing to state-of-the-art baselines, including recent rasterization-based approaches, we improve error rates by 15-30% while achieving real-time framerates (at least 36 FPS) for virtual-reality resolutions (2Kx2K).
Paper Structure (51 sections, 10 equations, 9 figures, 7 tables)

This paper contains 51 sections, 10 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: HybridNeRF. We train a hybrid surface--volume representation via surfaceness parameters that allow us to render most of the scene with few samples. We track Eikonal loss as we increase surfaceness to avoid degrading quality near fine and translucent structures (such as wires). On the bottom, we visualize the number of samples per ray (brighter is higher). Our model renders in high fidelity at 2K$\times$2K resolution at real-time frame rates.
  • Figure 2: Approach. In the first phase of our pipeline (a), we train a VolSDF-like YarivGKL2021 model with distance-adjusted Eikonal loss to model backgrounds without a separate NeRF (\ref{['sec:backgrounds']}). We then crucially transition from a uniform surfaceness parameter $\beta$ to position-dependent $\beta(\boldsymbol{x})$ values to model most of the scene as thin surfaces (needing few samples) without degrading quality near fine and semi-opaque structures (b). Since our model behaves as a valid SDF in $>$95% of the scene, we use sphere tracing at render time (c) along with lower-level optimizations (hardware texture interpolation) to query each sample as efficiently as possible.
  • Figure 3: Surfaces. Since NeRF directly predicts density, it often 'cheats' by modeling specular surfaces, such as floors, as semi-transparent volumes that require many samples per ray (heatmaps shown on the right, with brighter values corresponding to more samples). Methods that derive density from signed distances, such as ours, improve surface geometry and appearance while using fewer samples per ray.
  • Figure 4: Choice of $\beta$. Increasing $\beta$ reduces the number of samples needed to render per ray, but negatively impacts quality near fine objects (lamp wires) and transparent structures (glass door).
  • Figure 5: Spatially adaptive surfaceness. We make $\beta(\boldsymbol{x})$ spatially adaptive by means of a $512^3$ voxel grid that we increase during the finetuning stage. We track Eikonal loss as we increase surfaceness as it is highest near object boundaries and semi-transparent surfaces (top-right, brighter = higher loss) that degrade when surfaceness is too high (\ref{['fig:different-beta']}). We stop increasing surfaceness in regions that cross a given threshold.
  • ...and 4 more figures