From Activation to Initialization: Scaling Insights for Optimizing Neural Fields

Hemanth Saratchandran; Sameera Ramasinghe; Simon Lucey

From Activation to Initialization: Scaling Insights for Optimizing Neural Fields

Hemanth Saratchandran, Sameera Ramasinghe, Simon Lucey

TL;DR

The theoretical insights reveal a deep-seated connection among network initialization, architectural choices, and the optimization process, emphasizing the need for a holistic approach when designing cutting-edge Neural Fields.

Abstract

In the realm of computer vision, Neural Fields have gained prominence as a contemporary tool harnessing neural networks for signal representation. Despite the remarkable progress in adapting these networks to solve a variety of problems, the field still lacks a comprehensive theoretical framework. This article aims to address this gap by delving into the intricate interplay between initialization and activation, providing a foundational basis for the robust optimization of Neural Fields. Our theoretical insights reveal a deep-seated connection among network initialization, architectural choices, and the optimization process, emphasizing the need for a holistic approach when designing cutting-edge Neural Fields.

From Activation to Initialization: Scaling Insights for Optimizing Neural Fields

TL;DR

Abstract

Paper Structure (18 sections, 4 theorems, 14 equations, 8 figures, 1 table)

This paper contains 18 sections, 4 theorems, 14 equations, 8 figures, 1 table.

Introduction
Related Work
Notation
Theoretical Scaling Laws
A scaling law for shallow networks
A scaling law for deep networks
Analyzing the proof methodology
Designing new initializations
Experiments: Applications to Neural Fields
Practical Validation of the Theoretical Analysis
Shallow Experiment:
Deep Experiment:
Single Image Super Resolution
Occupancy Fields
Neural Radiance Fields (NeRF)
...and 3 more sections

Key Result

Theorem 4.2

Let $X$ be a fixed data set with $N$ samples. Let $F$ be a shallow neural network of depth $2$ admitting one of the following activation functions: where $\omega$ ($1/\omega^2$) is a fixed frequency hyperparameter. Let the widths of the network satisfy where $m$ is a fixed positive constant. Suppose the network has been initialized according to LeCun's initialization scheme Then for a small eno

Figures (8)

Figure 1: We evaluate Gaussian-activated networks comprising four hidden layers, initialized using four different methods, and trained with full-batch gradient descent. The comparison is performed on an image reconstruction task, with the final train PSNRs displayed in parentheses in the legend.
Figure 2: Diagram showing how to initialize weight matrices according to Initialization 1. The final output layer is initialized with a Normal distribution of smaller variance than the previous layers by a factor of $1/\sqrt{fan_{in}}$, where $fan_{in}$ denotes the input dimension to the layer.
Figure 3: Comparing how many parameters are needed for a ReLU-PE and sinc network to converge with different intializations and data set sizes. Left figure shows results for shallow networks on a 1-dim. curve fitting task. Right figure shows results for deep networks on a image regression task. For all initializations, the sinc activated networks require much less parameters to converge than the ReLU-PE ones.
Figure 4: Comparing the performance of deep networks with sinc activation across image regression tasks, utilizing four distinct initialization schemes. Networks were trained until reaching a 25dB PSNR. On the left, we observe the outcomes with four normal initializations, showcasing that our initialization demands the fewest parameters for convergence. On the right, the comparison extends to four different uniform initializations, where our approach emerges as the most effective.
Figure 5: The figure shows the results for a $4 \times$ single image super resolution with four normal initializations and four uniform initializations. Networks initialized with initialization 1 (our) and initialization 2 (our) produced the highest train dB and SSIM at convergence. Zoom in for better viewing.
...and 3 more figures

Theorems & Definitions (11)

Definition 4.1
Theorem 4.2
Remark 4.3
Theorem 4.4
Remark 4.5
Remark 4.6
Theorem 4.7
Remark 4.8
Theorem 4.9
Remark 4.10
...and 1 more

From Activation to Initialization: Scaling Insights for Optimizing Neural Fields

TL;DR

Abstract

From Activation to Initialization: Scaling Insights for Optimizing Neural Fields

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (11)