Table of Contents
Fetching ...

NeRF-Casting: Improved View-Dependent Appearance with Consistent Reflections

Dor Verbin, Pratul P. Srinivasan, Peter Hedman, Ben Mildenhall, Benjamin Attal, Richard Szeliski, Jonathan T. Barron

TL;DR

NeRF-Casting addresses the challenge of rendering highly specular content with NeRFs by introducing reflection-cone tracing into the rendering pipeline. Instead of evaluating a large view-dependent radiance MLP at every surface point, the method casts a small set of reflected rays through the scene and decodes a compact reflection feature into color, enabling consistent near-field and distant reflections with improved photorealism. Key innovations include conical reflection features, directional unscented sampling, 2D directional downweighting to prevent aliasing, an asymmetric predicted-normal loss to regularize geometry, and a multi-cone strategy that yields accurate and stable reflections across views. The approach achieves state-of-the-art results on shiny real and synthetic scenes while maintaining comparable optimization times to existing view-synthesis models, demonstrating practical impact for realistic rendering of glossy materials in complex environments.

Abstract

Neural Radiance Fields (NeRFs) typically struggle to reconstruct and render highly specular objects, whose appearance varies quickly with changes in viewpoint. Recent works have improved NeRF's ability to render detailed specular appearance of distant environment illumination, but are unable to synthesize consistent reflections of closer content. Moreover, these techniques rely on large computationally-expensive neural networks to model outgoing radiance, which severely limits optimization and rendering speed. We address these issues with an approach based on ray tracing: instead of querying an expensive neural network for the outgoing view-dependent radiance at points along each camera ray, our model casts reflection rays from these points and traces them through the NeRF representation to render feature vectors which are decoded into color using a small inexpensive network. We demonstrate that our model outperforms prior methods for view synthesis of scenes containing shiny objects, and that it is the only existing NeRF method that can synthesize photorealistic specular appearance and reflections in real-world scenes, while requiring comparable optimization time to current state-of-the-art view synthesis models.

NeRF-Casting: Improved View-Dependent Appearance with Consistent Reflections

TL;DR

NeRF-Casting addresses the challenge of rendering highly specular content with NeRFs by introducing reflection-cone tracing into the rendering pipeline. Instead of evaluating a large view-dependent radiance MLP at every surface point, the method casts a small set of reflected rays through the scene and decodes a compact reflection feature into color, enabling consistent near-field and distant reflections with improved photorealism. Key innovations include conical reflection features, directional unscented sampling, 2D directional downweighting to prevent aliasing, an asymmetric predicted-normal loss to regularize geometry, and a multi-cone strategy that yields accurate and stable reflections across views. The approach achieves state-of-the-art results on shiny real and synthetic scenes while maintaining comparable optimization times to existing view-synthesis models, demonstrating practical impact for realistic rendering of glossy materials in complex environments.

Abstract

Neural Radiance Fields (NeRFs) typically struggle to reconstruct and render highly specular objects, whose appearance varies quickly with changes in viewpoint. Recent works have improved NeRF's ability to render detailed specular appearance of distant environment illumination, but are unable to synthesize consistent reflections of closer content. Moreover, these techniques rely on large computationally-expensive neural networks to model outgoing radiance, which severely limits optimization and rendering speed. We address these issues with an approach based on ray tracing: instead of querying an expensive neural network for the outgoing view-dependent radiance at points along each camera ray, our model casts reflection rays from these points and traces them through the NeRF representation to render feature vectors which are decoded into color using a small inexpensive network. We demonstrate that our model outperforms prior methods for view synthesis of scenes containing shiny objects, and that it is the only existing NeRF method that can synthesize photorealistic specular appearance and reflections in real-world scenes, while requiring comparable optimization time to current state-of-the-art view synthesis models.
Paper Structure (28 sections, 27 equations, 8 figures, 5 tables)

This paper contains 28 sections, 27 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Our model architecture for rendering a single ray with origin $\mathbf{o}$ and direction $\mathbf{d}$. We sample $N$ points $\mathbf{x}^{(i)}$ along the ray and use a spatial encoder based on Zip-NeRF to encode each point into density $\tau^{(i)}$, roughness $\rho^{(i)}$, and surface normal $\mathbf{n}^{(i)}$. These are alpha composited to compute a single expected termination point $\bar{\mathbf{x}}$, a von Mises-Fisher distribution (vMF) width $\bar{\kappa}$, and surface normal $\bar{\mathbf{n}}$. Then $\mathbf{d}$ is reflected around that surface to construct a vMF distribution over reflected rays $\operatorname{vMF}(\mathbf{d}', \bar{\kappa})$. We sample $K$ reflected rays ($K=5$) with location $\mathbf{o}'$ and directions $\mathbf{d}'_j$ (as in Figure \ref{['fig:refrays']}). These $K$ rays are then cast, and points along them are encoded with the same model as the initial ray into $N'$ densities $\tau_j^{(i)}$ and features $\mathbf{f}_j^{(i)}$. These features are alpha composited along each ray to get per-ray features $\bar{\mathbf{f}}_j$, and the composited features are averaged into a single reflection feature $\mathbf{f}$. This feature is broadcast over the original ray's samples and passed, along with bottleneck features $\mathbf{b}^{(i)}$, mixing coefficients $\beta^{(i)}$, and viewing direction $\mathbf{d}$, to the color decoder to produce RGB colors $\mathbf{c}^{(i)}$ for each point along the ray. These colors are alpha composited to render a pixel color $\bar{\mathbf{c}}$.
  • Figure 2: We visualize the reflection cone tracing procedure described in Section \ref{['sec:cone']}. (a) A basic model for reflections: a ray cast from a camera ray origin $\mathbf{o}$ along view direction $\mathbf{d}$ that terminates at a coordinate $\bar{\mathbf{x}}$ with a surface normal $\bar{\mathbf{n}}$ is "mirrored" around the surface tangent to yield a reflected "camera ray origin" $\mathbf{o}'$ and direction $\mathbf{d}'$. This model does not consider that Zip-NeRF traces pixel cones instead of camera rays, which we address by (b) casting a reflection cone with radius $\dot r$ instead of a ray, and parameterizing not just a single reflected ray direction but a von Mises-Fisher distribution over reflected rays $\operatorname{vMF}(\mathbf{d}', \bar{\kappa})$, where $\bar{\kappa}$ is the inverse-roughness of the surface. We approximate this distribution with a small set of rays $\{ \mathbf{d}'_j \}$. (c) When the surface is not a perfect mirror, the reflection cone widens and the origin of the reflected ray cone must be pulled closer to the surface location to maintain the same intersection with the pixel cone.
  • Figure 3: An ablation study showing the effect of different design choices on our recovered normals. Replacing (a) our model's asymmetric predicted normal loss (Equation \ref{['eq:prednormalloss']}) with the standard symmetric one tends to (b) oversmooth or (c) underconstrain geometry, preventing the normal vectors from converging to the correct solution. (d) Using reflected features $\bar{\mathbf{f}}$ which are also used for other appearance components such as the bottleneck vector $\mathbf{b}$, or (e) using a single cone instead of $K$ directional sampling cones also often result in poor recovery of surface normals.
  • Figure 4: Ablation of our reflection anti-aliasing components. Using (a) a single reflection ray instead of five, (b) not downweighting reflection features, or (c) using Zip-NeRF's 3D Jacobian instead of restricting it to the 2D directional Jacobian, all result in inaccurate reflections. Even for low-roughness objects such as the shiny spheres shown here, aliasing in the reflections during optimization prevents the model from accurately reconstructing both the reflective surface geometry as well as the reflected content.
  • Figure 5: Two examples showing that (a) tracing cones and intersecting them with the geometry of the NeRF allows our model to recover and render near-field reflections such as the statue head and cone reflected in the balls, and the cord and artwork reflected in the toaster. This is in contrast to (b) our model with the feature grid only queried infinitely-far away, or (c) UniSDF, both of which only use the reflection direction. Note that both our model in (a) and our ablated model in (b) are capable of rendering accurate far-field reflections, while UniSDF renders blurry reflections even for far-field content.
  • ...and 3 more figures