Table of Contents
Fetching ...

SpecNeRF: Gaussian Directional Encoding for Specular Reflections

Li Ma, Vasu Agrawal, Haithem Turki, Changil Kim, Chen Gao, Pedro Sander, Michael Zollhöfer, Christian Richardt

TL;DR

This work targets accurate modeling of view-dependent specular reflections in NeRFs under near-field indoor lighting, where prior approaches assuming distant illumination struggle. It introduces Gaussian directional encoding using a learnable set of 3D Gaussians with parameters ${\boldsymbol\mu}_i$, ${\boldsymbol\sigma}_i$, and ${\mathbf{q}}_i$ to map a ray origin and direction to a 5D embedding, enabling preconvolved specular color to be predicted by a lightweight MLP while allowing roughness to modulate high-frequency content via ${\boldsymbol\sigma}_i \leftarrow \rho {\boldsymbol\sigma}_i$. To address shape–radiance ambiguity, a data-driven monocular normal prior is trained early with ${\mathcal L}_{\text{mono}}$ and then gradually removed, improving normals and specular alignment. Experiments on indoor near-field datasets show improved specular reconstruction and more meaningful color decomposition compared to baselines, with strong performance on Eyeful Tower and competitive results on related datasets, validating both the representation and the initialization strategy. Overall, the approach enables practical near-field relighting, reflection removal, and roughness editing while maintaining tractable computation through a moderate number of Gaussians.

Abstract

Neural radiance fields have achieved remarkable performance in modeling the appearance of 3D scenes. However, existing approaches still struggle with the view-dependent appearance of glossy surfaces, especially under complex lighting of indoor environments. Unlike existing methods, which typically assume distant lighting like an environment map, we propose a learnable Gaussian directional encoding to better model the view-dependent effects under near-field lighting conditions. Importantly, our new directional encoding captures the spatially-varying nature of near-field lighting and emulates the behavior of prefiltered environment maps. As a result, it enables the efficient evaluation of preconvolved specular color at any 3D location with varying roughness coefficients. We further introduce a data-driven geometry prior that helps alleviate the shape radiance ambiguity in reflection modeling. We show that our Gaussian directional encoding and geometry prior significantly improve the modeling of challenging specular reflections in neural radiance fields, which helps decompose appearance into more physically meaningful components.

SpecNeRF: Gaussian Directional Encoding for Specular Reflections

TL;DR

This work targets accurate modeling of view-dependent specular reflections in NeRFs under near-field indoor lighting, where prior approaches assuming distant illumination struggle. It introduces Gaussian directional encoding using a learnable set of 3D Gaussians with parameters , , and to map a ray origin and direction to a 5D embedding, enabling preconvolved specular color to be predicted by a lightweight MLP while allowing roughness to modulate high-frequency content via . To address shape–radiance ambiguity, a data-driven monocular normal prior is trained early with and then gradually removed, improving normals and specular alignment. Experiments on indoor near-field datasets show improved specular reconstruction and more meaningful color decomposition compared to baselines, with strong performance on Eyeful Tower and competitive results on related datasets, validating both the representation and the initialization strategy. Overall, the approach enables practical near-field relighting, reflection removal, and roughness editing while maintaining tractable computation through a moderate number of Gaussians.

Abstract

Neural radiance fields have achieved remarkable performance in modeling the appearance of 3D scenes. However, existing approaches still struggle with the view-dependent appearance of glossy surfaces, especially under complex lighting of indoor environments. Unlike existing methods, which typically assume distant lighting like an environment map, we propose a learnable Gaussian directional encoding to better model the view-dependent effects under near-field lighting conditions. Importantly, our new directional encoding captures the spatially-varying nature of near-field lighting and emulates the behavior of prefiltered environment maps. As a result, it enables the efficient evaluation of preconvolved specular color at any 3D location with varying roughness coefficients. We further introduce a data-driven geometry prior that helps alleviate the shape radiance ambiguity in reflection modeling. We show that our Gaussian directional encoding and geometry prior significantly improve the modeling of challenging specular reflections in neural radiance fields, which helps decompose appearance into more physically meaningful components.
Paper Structure (41 sections, 20 equations, 21 figures, 5 tables)

This paper contains 41 sections, 20 equations, 21 figures, 5 tables.

Figures (21)

  • Figure 1: We propose a Gaussian directional encoding that leads to better modeling of specular reflections under near-field lighting conditions. In contrast, the integrated directional encoding utilized in Ref-NeRF VerbiHMZBS2022 and Fourier directional encoding in NeRF MildeSTBRN2020 exhibit suboptimal performance under similar conditions.
  • Figure 2: An overview of our model. The key enabler for specular reflections is our novel 3D Gaussian directional encoding module that converts the reflected ray into a spatially-varying embedding, which is further decoded into specular color.
  • Figure 3: Toy example of 3D Gaussian encoding.Left: A hemisphere probe translates underneath 4 lights along positions numbered 1 to 4. Note that we dilate the lights for better visualization. Right: Representation of the probe's specular components using spherical harmonics and our 3D Gaussian directional encoding. The SH encoding shows a more complex pattern under position change, while ours has spatially largely invariant coefficients. This suggests a simpler function for the specular prediction MLP to fit using Gaussian directional encoding.
  • Figure 4: Stereographic projections of the specular fitting results for the toy example in \ref{['fig:toy1']}. Both encodings produce 25 coefficients for each color channel, which are then summed to produce the final color. Note that the GT shows soft boundaries because it is preconvolved. The 3D Gaussian-based encoding demonstrates superior performance in representing the specular change with positional changes, and is also capable of smoothly varying roughness.
  • Figure 5: The specular component reconstruction (first row, except the first image), novel-view synthesis results (second row) and normal visualizations (third row) under varying monocular normal supervision. The target normal visualizes the monocular normal prediction. Without $\mathcal{L}_\text{mono}$, the predicted normal exhibits enormous error, leading to poor specular reconstruction. Without early stopping $\mathcal{L}_\text{mono}$, minor errors in the predicted normals lead to a slight degradation in the reflection quality compared to our full model.
  • ...and 16 more figures