Spiking NeRF: Representing the Real-World Geometry by a Discontinuous Representation

Zhanfeng Liao; Qian Zheng; Yan Liu; Gang Pan

Spiking NeRF: Representing the Real-World Geometry by a Discontinuous Representation

Zhanfeng Liao, Qian Zheng, Yan Liu, Gang Pan

TL;DR

This paper proposes spiking NeRF, which leverages spiking neurons and a hybrid Artificial Neural Network (ANN)-Spiking Neural Network (SNN) framework to build a discontinuous density field for faithful geometry representation.

Abstract

A crucial reason for the success of existing NeRF-based methods is to build a neural density field for the geometry representation via multiple perceptron layers (MLPs). MLPs are continuous functions, however, real geometry or density field is frequently discontinuous at the interface between the air and the surface. Such a contrary brings the problem of unfaithful geometry representation. To this end, this paper proposes spiking NeRF, which leverages spiking neurons and a hybrid Artificial Neural Network (ANN)-Spiking Neural Network (SNN) framework to build a discontinuous density field for faithful geometry representation. Specifically, we first demonstrate the reason why continuous density fields will bring inaccuracy. Then, we propose to use the spiking neurons to build a discontinuous density field. We conduct a comprehensive analysis for the problem of existing spiking neuron models and then provide the numerical relationship between the parameter of the spiking neuron and the theoretical accuracy of geometry. Based on this, we propose a bounded spiking neuron to build the discontinuous density field. Our method achieves SOTA performance. The source code and the supplementary material are available at https://github.com/liaozhanfeng/Spiking-NeRF.

Spiking NeRF: Representing the Real-World Geometry by a Discontinuous Representation

TL;DR

Abstract

Paper Structure (45 sections, 24 equations, 8 figures, 3 tables)

This paper contains 45 sections, 24 equations, 8 figures, 3 tables.

Introduction
Related Work
Neural Implicit Representations
Spiking Neural Networks in Computer Vision
Preliminary
Neural Radiance Fields
Spiking Neuron
Integrate-and-fire model.
Full-precision integrate-and-fire model.
Spiking NeRF
Relationship between Parameters of Spiking Neuron and Depth Error
Proposition 1.
B-FIF: Bounded full-precision integrate-and-fire spiking neuron.
Hybrid ANN-SNN Framework
B-FIF implementation.
...and 30 more sections

Figures (8)

Figure 1: Left: The extracted surfaces from NeRF. Each row in the first big red box represents a surface extracted by a trained NeRF using different thresholds, indicating that the optimal thresholds corresponding to different scenarios are different. The tick represents that the threshold is optimal. The cross represents that the threshold is not optimal. Middle: The error maps from different views in the same scene. Each row in the second big red box represents the depth error map of a trained NeRF's surface extracted with different thresholds from different views. It can be seen that the optimal thresholds corresponding to different views are different. Right: The extracted surfaces from NeRF. These figures show that the inconsistency can result in even greater errors in light density scenarios. The image in the bottom right corner of each part represents the original image from the corresponding view. The number displayed in the bottom left corner of each image represents either the Chamfer distance (left and right) or the depth error (middle).
Figure 2: Framework overview of spiking NeRF and an illustration of different existing spiking neuron models and the proposed one. Left: The network structure of our approach. We use a NeRF model following 1nerf but excluding the last activation layer of the density network. Instead of using ReLU, we use B-FIF spiking neurons to make the density field discontinuous. Right top: the IF and FIF. Right bottom: B-FIF with different $r$ ($r=2$ and $5$). These curves show that B-FIF becomes more similar to FIF as the parameter $r$ increases. And when the $r$ is sufficiently large, B-FIF degenerates to the FIF.
Figure 3: Visual Quality Comparisons on surface reconstruction on Blender dataset 1nerf, DTU dataset 80jensen2014large, semi-transparent dataset 20dexnerf, and thin object dataset. We show the Chamfer distance in the bottom left corner of the image. The results of the 2nd and 4th rows are multiplied by $10^2$.
Figure 4: The relationship between the upper bound and the average depth error during training. We show 6 scenes from Blender dataset 1nerf. We randomly choose a view for displaying from each scene, and compute error and upper bound in $\text{epoch}=10\text{K}$, $50\text{K}$, $100\text{K}$, $150\text{K}$ and $200\text{K}$. The red curve represents the upper bound while the blue curve represents the average depth error during training. It can be seen that the average depth error decreases with the upper bound and the average depth error keeps being less than the upper bound during training.
Figure 5: Ablation studies. We show qualitative results and report the quantitative metrics in Chamfer distance.
...and 3 more figures

Spiking NeRF: Representing the Real-World Geometry by a Discontinuous Representation

TL;DR

Abstract

Spiking NeRF: Representing the Real-World Geometry by a Discontinuous Representation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)