Preconditioners for the Stochastic Training of Neural Fields

Shin-Fang Chng; Hemanth Saratchandran; Simon Lucey

Preconditioners for the Stochastic Training of Neural Fields

Shin-Fang Chng, Hemanth Saratchandran, Simon Lucey

TL;DR

Neural fields can be trained more quickly in stochastic settings by applying curvature-aware diagonal preconditioners. The work shows that Adam effectively behaves as a diagonal Gauss-Newton preconditioner, and that activating networks with sine, Gaussian, or wavelet functions enables ESGD and related preconditioners to substantially accelerate training compared to Adam, across image reconstruction, 3D occupancy, and NeRF tasks. Theoretical results connect Hessian-vector structure to activation type, and empirical results confirm activation-dependent benefits, with limitations for ReLU-PE where Adam remains superior. This provides a practical, activation-aware framework for speeding up neural field optimization in large-scale stochastic regimes.

Abstract

Neural fields encode continuous multidimensional signals as neural networks, enabling diverse applications in computer vision, robotics, and geometry. While Adam is effective for stochastic optimization, it often requires long training times. To address this, we explore alternative optimization techniques to accelerate training without sacrificing accuracy. Traditional second-order methods like L-BFGS are unsuitable for stochastic settings. We propose a theoretical framework for training neural fields with curvature-aware diagonal preconditioners, demonstrating their effectiveness across tasks such as image reconstruction, shape modeling, and Neural Radiance Fields (NeRF).

Preconditioners for the Stochastic Training of Neural Fields

TL;DR

Abstract

Paper Structure (20 sections, 2 theorems, 9 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 20 sections, 2 theorems, 9 equations, 6 figures, 1 table, 1 algorithm.

Introduction
Related Work
Neural Fields
Preconditioners
Notation
Theoretical Framework
Overview of Preconditioners
Diagonal Preconditioners:
Kronecker Factored Preconditioners:
Condition Number
Training with Adam
Main theorems
Experiments
Remark for AdaHessian
Remark for ESGD
...and 5 more sections

Key Result

Theorem 4.2

Let $F$ denote a sine, Gaussian, wavelet or sinc activated neural field. Let $\mathcal{L}(\theta)$ denote the MSE loss associated to $F$ and a training set $(X, Y)$. Let $H$ denote the Hessian of $\mathcal{L}$ at a fixed parameter point $\theta$. Then given a non-zero vector $v$ the Hessian vector p

Figures (6)

Figure 1: \ref{['fig:theory_condition_number']} compares the condition number (\ref{['cond_number_defn']}) on a 1D signal regression task before (dotted line) and after (solid line) applying equilbrated $D^E$ and Jacobi $D^J$ preconditioners. The equilibrated preconditioner significantly reduces the Hessian's condition number, as shown by the larger gap between the yellow solid and dotted lines compared to the green solid and dotted lines representing the Jacobi preconditioner, thus achieves faster convergence as seen in \ref{['fig:theory_1d_convergence']}.
Figure 2: Comparison of training convergence for neural fields with different activations -- ReLU with positional encoding (ReLU (PE)), wavelet, sine and Gaussian -- using Adam and SGD optimizers across three tasks: 2D image reconstruction (\ref{['fig:theory_img_sgd_vs_adam']}), 3D binary occupancy reconstruction (\ref{['fig:theory_occupancy_sgd_vs_adam']}), and NeRF (\ref{['fig:theory_nerf_sgd_vs_adam']}). In all cases, the Adam optimizer outperforms SGD, indicating that leveraging curvature information facilitates faster convergence.
Figure 3: Percentage sparsity of Hessian-vector product matrices across all layers throughout the training process of a 2D image reconstruction task, using ESGD to train networks with different activations. The x-axis ranges from $0$ to $1$, with values closer to 1 indicating higher sparsity.
Figure 4: Comparison of training convergence and computational complexity for various preconditioners. We evaluated a Gaussian-activated neural field on the lion instance from the DIV2K dataset. ESGD demonstrates superior convergence compared to other preconditioners, striking a balance between accuracy and computational efficiency. Note: We provide the time-based comparisons in the supp. (Sec 3.1). Similar analysis for other activations and additional instances from the DIV2K dataset are available in the supp. (Sec. 3.2).
Figure 5: Comparison of training convergence for various preconditioners. We evaluated a Gaussian-activated neural field on a 3D binary occupancy reconstruction task on the armadillo instance. ESGD demonstrates superior convergence compared to other preconditioners, striking a balance between accuracy and computational efficiency. Note: Similar analysis for other activations are available in the supp.
...and 1 more figures

Theorems & Definitions (3)

Definition 4.1
Theorem 4.2
Theorem 4.3

Preconditioners for the Stochastic Training of Neural Fields

TL;DR

Abstract

Preconditioners for the Stochastic Training of Neural Fields

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (3)