Expressivity of Neural Networks with Random Weights and Learned Biases

Ezekiel Williams; Alexandre Payeur; Avery Hee-Woon Ryoo; Thomas Jiralerspong; Matthew G. Perich; Luca Mazzucato; Guillaume Lajoie

Expressivity of Neural Networks with Random Weights and Learned Biases

Ezekiel Williams, Alexandre Payeur, Avery Hee-Woon Ryoo, Thomas Jiralerspong, Matthew G. Perich, Luca Mazzucato, Guillaume Lajoie

TL;DR

This work shows that neural networks can express a wide class of functions and dynamical trajectories even when all weights are fixed randomly and only biases are learned. By introducing γ-bias-learning activations and leveraging masking-like arguments, the authors prove universal approximation for bias-learning feedforward networks and finite-horizon trajectory approximation for bias-learning recurrent networks on compact sets. They complement the theory with extensive simulations in multi-task learning, dynamical system forecasting, and motor control, illustrating task-specific organization and comparing bias learning to masking approaches. The results have implications for neuroscience, where bias modulation can reconfigure dynamics without synaptic changes, and AI, where efficient fine-tuning via biases or prefixes may achieve broad adaptability. Overall, the paper links bias-centric learning to universal expressivity and provides both theoretical and empirical grounding for bias-driven adaptation in neural systems.

Abstract

Landmark universal function approximation results for neural networks with trained weights and biases provided the impetus for the ubiquitous use of neural networks as learning models in neuroscience and Artificial Intelligence (AI). Recent work has extended these results to networks in which a smaller subset of weights (e.g., output weights) are tuned, leaving other parameters random. However, it remains an open question whether universal approximation holds when only biases are learned, despite evidence from neuroscience and AI that biases significantly shape neural responses. The current paper answers this question. We provide theoretical and numerical evidence demonstrating that feedforward neural networks with fixed random weights can approximate any continuous function on compact sets. We further show an analogous result for the approximation of dynamical systems with recurrent neural networks. Our findings are relevant to neuroscience, where they demonstrate the potential for behaviourally relevant changes in dynamics without modifying synaptic weights, as well as for AI, where they shed light on recent fine-tuning methods for large language models, like bias and prefix-based approaches.

Expressivity of Neural Networks with Random Weights and Learned Biases

TL;DR

Abstract

Paper Structure (30 sections, 14 theorems, 31 equations, 10 figures, 1 table)

This paper contains 30 sections, 14 theorems, 31 equations, 10 figures, 1 table.

Introduction
Related works
Theory results
Feedforward neural networks
Recurrent neural networks
Numerical results
Multi-task learning with bias-learned FNNs
Relationship between bias learning and mask learning in FNNs
Bias learning autonomous dynamical systems with RNNs
Bias learning non-autonomous dynamical systems with RNNs
Bias-learned Motor control with RNNs
Discussion
Biological Bias-Related Mechanisms
Mathematical Proofs
Random Neural Network Formulation
...and 15 more sections

Key Result

Proposition 1

The ReLU and the Heaviside step function are $\gamma$-parameter bounding activations for any $\gamma > 0$.

Figures (10)

Figure 1: A. Validation accuracy on fashion MNIST vs. number of trained parameters for fully-trained (blue), bias-learned with uniformly distributed weights (light orange), and bias-learned with Gaussian weights (dark orange) networks. B. Validation accuracy on multiple image classification tasks for bias-learned (orange) and fully-trained (blue) networks. Errors for 5 random seeds are barely visible as the shaded regions in A, and are omitted in B because the standard errors are of order $10^{-3}$. C. Top: K-mean clustering of Task Variance (TV) reveals task-selective clusters (see Fig.\ref{['fig:supp-fig1']} for fully-trained network selectivity). Bottom: Spearman correlation between TV and bias vectors (mean across neurons in each cluster).
Figure 2: Comparing bias and mask learning on same weights. A. Learning curves for bias (orange) and mask (black) learning on MNIST. Inset: bias learning achieved roughly $1\%$ higher test accuracy over mask learning ($0.934\pm0.001 \mathrm{SD}$ bias vs. $0.919\pm0.002 \mathrm{SD}$ mask). B. Probability ($y$-axis) of the same unit being ON in both the bias-learning and mask-learning networks (orange line). A unit is 'ON' in mask learning if it is not masked out, and in bias learning if it has task variance above a given threshold ($x$-axis). Also shown is the probability of a unit being ON in two different training runs for mask-learning (black dashed line), and a null model giving the expected overlap if the probability of a unit being ON in the bias-trained network is independent of whether it is ON in the mask network (see Appendix §\ref{['sec:methods2']} for more details) C. Histograms of hidden unit variances, calculated over $10^4$ test set MNIST samples, for bias-trained (orange) and mask-trained (black). Unit variances below $0.1$ are not shown. All curves, and histograms, are means, with shaded regions being $1$SD over $5$ training runs.
Figure 3: Learning autonomous dynamical systems. A. Cosine generated by a bias-learning RNN (dashed orange) and its target (solid black). B. Eigenvalue spectra for the recurrent weights (left) and the Jacobian at the start of training (right, grey squares) and mid-training (right, orange circles), when the network produced a decaying oscillation with period 23.75, close to the target period of 25. Neural activity then approached a fixed point with respect to which the Jacobian was computed. C. Van der Pol oscillator (target in solid black) generated by the bias-learning RNN for a recurrent gain of 1 (dashed orange; see panel D) and a gain of 0.9 (dashed dark orange). Output represents the oscillator's position, rescaled to [-1, 1]. D. (Left) Sensitivity to distribution of recurrent weights. The fully-trained and bias-learning networks had the same number of learnable parameters. Initial recurrent weight matrix had elements sampled from $(g/\sqrt{m}) \mathcal{N}(0,1)$, where $g$ is the gain (Gain recurrent init.). Error bars denote SEM for $n=10$. (Right) Schematics of the fully-trained (top) and bias-learning (bottom) autonomous RNNs. Colored links denote trained weights.
Figure 4: Learning non-autonomous dynamical systems.A. Validation $R^2$ vs. number of trainable model parameters for fully-trained (blue) and bias-learned (orange) RNNs. Training RNNs with bias learning became unstable below a network width of $64$. B. Predictions from the fully-trained and bias-learned networks (both with a hidden layer width of 1024) on a trajectory of the Lorenz system unseen during training. Standard deviation error bars were computed over 5 seeds, but are not visible due to their small magnitudes. C. Predictions of both the fully-trained and bias-learned networks diverge from the ground truth signal when one starts feeding back their own outputs as their inputs, in place of the ground-truth time-series (self-sustained, starting from the grey line).
Figure 5: Center-out reaching task.A. Training loss for 3 network initializations. B. Trajectories for the trained (black) and tested (grey) targets. C. Speeds ($(\dot{x}^2 + \dot{y}^2)^{1/2}$) for the trained (top) and tested (bottom) targets (mean $\pm$ SD across targets).
...and 5 more figures

Theorems & Definitions (26)

Definition 1
Definition 2
Proposition 1
Definition 3
Theorem 1
Corollary 1
Theorem 2
Proposition 1
proof
Lemma 1
...and 16 more

Expressivity of Neural Networks with Random Weights and Learned Biases

TL;DR

Abstract

Expressivity of Neural Networks with Random Weights and Learned Biases

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (26)