Table of Contents
Fetching ...

Gaussian Splatting with NeRF-based Color and Opacity

Dawid Malarz, Weronika Smolak, Jacek Tabor, Sławomir Tadeja, Przemysław Spurek

TL;DR

VDGS tackles the slow training/inference of NeRFs and the conditioning challenge of Gaussian Splatting by proposing a hybrid that represents geometry with 3D Gaussians and uses a lightweight NeRF-based network to modulate view-dependent color and opacity. The method parameterizes Gaussians with trainable means $\mathrm{m}_i$, covariances $\Sigma_i$, and opacities $\sigma_i$, and learns a function $\mathcal{F}_{VDGS}(\mathrm{m}_i,\\mathbf{d};\\Theta)$ to produce opacity updates $\\Delta \sigma(\\mathbf{d})$, integrating into the pixel color via a differentiable compositing equation $C(p)=\sum_{i\in N} c_i \alpha_i \prod_{j=1}^{i-1}(1-\\alpha_j)$ with $\\alpha_i=1-\\exp\left(\sigma_i \cdot \mathcal{F}_{VDGS}(\mathrm{m}_i,\\mathbf{d};\\Theta) \cdot \delta_i\right)$. The results show VDGS achieves competitive or superior view synthesis quality across standard datasets while maintaining GS-like training/inference speed, and it improves handling of shadows, reflections, and transparency. This work demonstrates that conditioning Gaussians with a NeRF-based network can combine the best of both paradigms for flexible, real-time capable neural rendering.

Abstract

Neural Radiance Fields (NeRFs) have demonstrated the remarkable potential of neural networks to capture the intricacies of 3D objects. By encoding the shape and color information within neural network weights, NeRFs excel at producing strikingly sharp novel views of 3D objects. Recently, numerous generalizations of NeRFs utilizing generative models have emerged, expanding its versatility. In contrast, Gaussian Splatting (GS) offers a similar render quality with faster training and inference as it does not need neural networks to work. It encodes information about the 3D objects in the set of Gaussian distributions that can be rendered in 3D similarly to classical meshes. Unfortunately, GS are difficult to condition since they usually require circa hundred thousand Gaussian components. To mitigate the caveats of both models, we propose a hybrid model Viewing Direction Gaussian Splatting (VDGS) that uses GS representation of the 3D object's shape and NeRF-based encoding of color and opacity. Our model uses Gaussian distributions with trainable positions (i.e. means of Gaussian), shape (i.e. covariance of Gaussian), color and opacity, and a neural network that takes Gaussian parameters and viewing direction to produce changes in the said color and opacity. As a result, our model better describes shadows, light reflections, and the transparency of 3D objects without adding additional texture and light components.

Gaussian Splatting with NeRF-based Color and Opacity

TL;DR

VDGS tackles the slow training/inference of NeRFs and the conditioning challenge of Gaussian Splatting by proposing a hybrid that represents geometry with 3D Gaussians and uses a lightweight NeRF-based network to modulate view-dependent color and opacity. The method parameterizes Gaussians with trainable means , covariances , and opacities , and learns a function to produce opacity updates , integrating into the pixel color via a differentiable compositing equation with . The results show VDGS achieves competitive or superior view synthesis quality across standard datasets while maintaining GS-like training/inference speed, and it improves handling of shadows, reflections, and transparency. This work demonstrates that conditioning Gaussians with a NeRF-based network can combine the best of both paradigms for flexible, real-time capable neural rendering.

Abstract

Neural Radiance Fields (NeRFs) have demonstrated the remarkable potential of neural networks to capture the intricacies of 3D objects. By encoding the shape and color information within neural network weights, NeRFs excel at producing strikingly sharp novel views of 3D objects. Recently, numerous generalizations of NeRFs utilizing generative models have emerged, expanding its versatility. In contrast, Gaussian Splatting (GS) offers a similar render quality with faster training and inference as it does not need neural networks to work. It encodes information about the 3D objects in the set of Gaussian distributions that can be rendered in 3D similarly to classical meshes. Unfortunately, GS are difficult to condition since they usually require circa hundred thousand Gaussian components. To mitigate the caveats of both models, we propose a hybrid model Viewing Direction Gaussian Splatting (VDGS) that uses GS representation of the 3D object's shape and NeRF-based encoding of color and opacity. Our model uses Gaussian distributions with trainable positions (i.e. means of Gaussian), shape (i.e. covariance of Gaussian), color and opacity, and a neural network that takes Gaussian parameters and viewing direction to produce changes in the said color and opacity. As a result, our model better describes shadows, light reflections, and the transparency of 3D objects without adding additional texture and light components.
Paper Structure (16 sections, 6 equations, 10 figures, 6 tables)

This paper contains 16 sections, 6 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Comparison of Gaussian Splatting (GS) and Viewing Direction Gaussian Splatting (VDGS). In our model, the color and opacity of Gaussians depend on the viewing direction. Consequently, we better model light reflection, and transparency of 3D objects (see elements in red rectangles).
  • Figure 2: The optimization procedure starts with Structure from Motion (SfM) points, either sourced from COLMAP or created randomly, which establish the initial conditions for the 3D Gaussians. The camera position and Gaussian center, encoded with a hash, are fed into an MLP network to update the opacity $\delta \sigma$ of the 3D Gaussians in canonical space. Subsequently, a rapid differential Gaussian rasterization pipeline is employed to concurrently optimize both the MLP and the parameters of the 3D Gaussians.
  • Figure 3: Visual comparison between classical GS and Viewing Direction Gaussian Splatting on the dataset: Synthetic NeRF mildenhall2020nerf. It further showcases superior performance in modeling shiny surfaces compared to the original GS.
  • Figure 4: Visual comparison between classical GS and VDGS on Tanks and Temples knapitsch2017tanks and Mip-NeRF 360 barron2022mip datasets. VDGS compared to classical GS renders fewer artifacts which can be observed both on the renders as well as in the PSNR scores (see also Table \ref{['tab:scene_mip_tt_db']}).
  • Figure 5: As visible in the render made using Mip-NeRF 360 scene barron2022mip, Viewing Direction Gaussian Splatting not only performs well in reproducing an object in the main focus. It also better represents the glass surface and can eliminate artifacts.
  • ...and 5 more figures