Table of Contents
Fetching ...

Random Weight Factorization Improves the Training of Continuous Neural Representations

Sifan Wang, Hanwen Wang, Jacob H. Seidman, Paris Perdikaris

TL;DR

This work addresses the difficulty of training coordinate-based neural representations due to spectral bias and poor initializations. It introduces random weight factorization, rewriting weights as $\mathbf{w}^{(k,l)} = s^{(k,l)} \mathbf{v}^{(k,l)}$ and $\mathbf{W}^{(l)} = \operatorname{diag}(\mathbf{s}^{(l)}) \mathbf{V}^{(l)}$, which induces per-neuron adaptive learning rates and reshapes the loss landscape to facilitate optimization. The approach yields consistent, robust improvements across image regression, 3D shape representation, CT, inverse rendering, PINNs, and operator learning (e.g., DeepONet), while incurring minimal overhead. This suggests a versatile, drop-in enhancement for continuous neural representations with broad practical impact in vision, graphics, and scientific computing.

Abstract

Continuous neural representations have recently emerged as a powerful and flexible alternative to classical discretized representations of signals. However, training them to capture fine details in multi-scale signals is difficult and computationally expensive. Here we propose random weight factorization as a simple drop-in replacement for parameterizing and initializing conventional linear layers in coordinate-based multi-layer perceptrons (MLPs) that significantly accelerates and improves their training. We show how this factorization alters the underlying loss landscape and effectively enables each neuron in the network to learn using its own self-adaptive learning rate. This not only helps with mitigating spectral bias, but also allows networks to quickly recover from poor initializations and reach better local minima. We demonstrate how random weight factorization can be leveraged to improve the training of neural representations on a variety of tasks, including image regression, shape representation, computed tomography, inverse rendering, solving partial differential equations, and learning operators between function spaces.

Random Weight Factorization Improves the Training of Continuous Neural Representations

TL;DR

This work addresses the difficulty of training coordinate-based neural representations due to spectral bias and poor initializations. It introduces random weight factorization, rewriting weights as and , which induces per-neuron adaptive learning rates and reshapes the loss landscape to facilitate optimization. The approach yields consistent, robust improvements across image regression, 3D shape representation, CT, inverse rendering, PINNs, and operator learning (e.g., DeepONet), while incurring minimal overhead. This suggests a versatile, drop-in enhancement for continuous neural representations with broad practical impact in vision, graphics, and scientific computing.

Abstract

Continuous neural representations have recently emerged as a powerful and flexible alternative to classical discretized representations of signals. However, training them to capture fine details in multi-scale signals is difficult and computationally expensive. Here we propose random weight factorization as a simple drop-in replacement for parameterizing and initializing conventional linear layers in coordinate-based multi-layer perceptrons (MLPs) that significantly accelerates and improves their training. We show how this factorization alters the underlying loss landscape and effectively enables each neuron in the network to learn using its own self-adaptive learning rate. This not only helps with mitigating spectral bias, but also allows networks to quickly recover from poor initializations and reach better local minima. We demonstrate how random weight factorization can be leveraged to improve the training of neural representations on a variety of tasks, including image regression, shape representation, computed tomography, inverse rendering, solving partial differential equations, and learning operators between function spaces.
Paper Structure (19 sections, 2 theorems, 19 equations, 5 figures, 1 table)

This paper contains 19 sections, 2 theorems, 19 equations, 5 figures, 1 table.

Key Result

Theorem 1

Suppose that $\mathcal{L}(\bm{\theta})$ is the associated loss function of a neural network defined in equation eq: mlp_1 and equation eq: mlp_2. For a given $\bm{\theta}$, we define $U_{\bm{\theta}}$ as the set containing all possible weight factorizations Then for any $\bm{\theta}, \bm{\theta}'$, we have

Figures (5)

  • Figure 1: Weight factorization transforms loss landscapes and shortens the distance to minima.
  • Figure 2: 1D regression:Top: Model predictions using different parameterizations. Plain: Standard MLP; AA: adaptive activation; WN: weight normalization; RWF: random weight factorization. Bottom left: Mean square error (MSE) during training. Bottom Middle: Relative change of weights during training. The comparison is performed in the original parameter space. Bottom right: Eigenvalues (descending order) of the empirical NTK at the end of training.
  • Figure 3: Adection: Predicted solutions of trained MLPs with different weight parameterizations, along with the evolution of the associated relative $L^2$ prediction errors during training.
  • Figure 4: Navier-Stokes: Predicted solutions of trained MLPs with different weight parameterizations, along with the evolution of the associated relative $L^2$ errors during training.
  • Figure 5: Learning operators: Loss convergence of training DeepONets with different weight parameterizations for diffusion-reaction equation, Darcy flow and the Burgers' equation.

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2