Pretraining with Random Noise for Fast and Robust Learning without Weight Transport

Jeonghwan Cheon; Sang Wan Lee; Se-Bum Paik

Pretraining with Random Noise for Fast and Robust Learning without Weight Transport

Jeonghwan Cheon, Sang Wan Lee, Se-Bum Paik

TL;DR

This work tackles how biologically plausible learning can achieve fast, reliable credit assignment without weight transport. It introduces random-noise pretraining within a feedback-alignment framework, showing that pretraining aligns forward weights with fixed random backward pathways, enabling fast subsequent learning and better generalization—often rivaling backpropagation on several benchmarks. The approach yields lower effective weight dimensionality and improved out-of-distribution performance, and demonstrates task-agnostic meta-learning benefits. Overall, random-noise pretraining serves as a simple, effective preconditioning method bridging biological plausibility and practical efficiency in neural networks.

Abstract

The brain prepares for learning even before interacting with the environment, by refining and optimizing its structures through spontaneous neural activity that resembles random noise. However, the mechanism of such a process has yet to be thoroughly understood, and it is unclear whether this process can benefit the algorithm of machine learning. Here, we study this issue using a neural network with a feedback alignment algorithm, demonstrating that pretraining neural networks with random noise increases the learning efficiency as well as generalization abilities without weight transport. First, we found that random noise training modifies forward weights to match backward synaptic feedback, which is necessary for teaching errors by feedback alignment. As a result, a network with pre-aligned weights learns notably faster than a network without random noise training, even reaching a convergence speed comparable to that of a backpropagation algorithm. Sequential training with both random noise and data brings weights closer to synaptic feedback than training solely with data, enabling more precise credit assignment and faster learning. We also found that each readout probability approaches the chance level and that the effective dimensionality of weights decreases in a network pretrained with random noise. This pre-regularization allows the network to learn simple solutions of a low rank, reducing the generalization loss during subsequent training. This also enables the network robustly to generalize a novel, out-of-distribution dataset. Lastly, we confirmed that random noise pretraining reduces the amount of meta-loss, enhancing the network ability to adapt to various tasks. Overall, our results suggest that random noise training with feedback alignment offers a straightforward yet effective method of pretraining that facilitates quick and reliable learning without weight transport.

Pretraining with Random Noise for Fast and Robust Learning without Weight Transport

TL;DR

Abstract

Paper Structure (35 sections, 6 equations, 26 figures, 12 tables, 1 algorithm)

This paper contains 35 sections, 6 equations, 26 figures, 12 tables, 1 algorithm.

Introduction
Preliminaries
Backpropagation and weight transport problem
Feedback alignment
Random noise pretraining with feedback alignment
Results
Weight alignment to synaptic feedback during random noise training
Pretraining random noise enables fast learning during subsequent data training
Pre-regularization by random noise training enables robust generalization
Task-agnostic fast learning for various tasks by a network pretrained with random noise
Discussion
Broader impacts and limitations
Code availability
Experimental details and additional results for section 4.1
Network architecture and training details
...and 20 more sections

Figures (26)

Figure 1: Weight alignment to randomly fixed synaptic feedback induced through random noise training. (a) Forward and backward pathways of backpropagation and feedback alignment. (b) Possible scenario of the feedback alignment algorithm in a biological synaptic circuit. (c) Schematic of random training, where the input $\mathbf{x}$ and label $\mathbf{y}$ are randomly sampled and paired in each iteration. (d) Cross-entropy loss during random training. (e) Alignment angle between forward weights and synaptic feedbacks in the last layer. (f) Alignment angle with various random input conditions.
Figure 2: Effect of random noise pretraining on subsequent data training. (a) Design of the MNIST classification task to investigate the effect of random training. (b) Test accuracy during the training process, where the inset demonstrates the convergence speed of each training method, calculated by the AUC of the test accuracy. (c) Alignment angle between weights and synaptic feedback across random training and data training. (d) Trajectory of weights ($\mathbf{W}_1$) toward synaptic feedback ($\mathbf{B}_1$) in latent space obtained by PCA for random and data training. (e) Distance between the weights ($\mathbf{W}_1$) and the synaptic feedback ($\mathbf{B}_1$). (f) Order dependence of the trajectory of the weights ($\mathbf{W}_1$). (g) Distance between the weights ($\mathbf{W}_1$) and the synaptic feedback ($\mathbf{B}_1$) for different orders of random and data trainings.
Figure 3: Comparison of model performance across different image datasets and network depths. (a-e) Final accuracy after convergence. Experiments were conducted with networks of varying depths on different tasks: (a) MNIST, (b) Fashion-MNIST, (c) CIFAR-10, (d) CIFAR-100, and (e) STL-10.
Figure 4: Pre-regularization by random noise training enhances generalization (a) Untrained network and pre-regularized network through random noise training. (b) Distribution of the readout probability. (c) Singular value spectrum of the forward weights. (d) Effective rank of forward weights during random noise training. (e) Generalization error between the training error and test error (training set size: 1600, network depth: 3). (f) Generalization error for various training set sizes (network depth: 3). (g) Effective dimensionality of the Gram matrix, the cosine similarity of feature vectors across neurons (training set size: 1600, network depth: 3). (h) Effective dimensionality of the Gram matrix for various network depths (training set size: 1600).
Figure 5: Robust generalization of "out-of-distribution" tasks in randomly pretrained networks. (a) Training in-distribution data (MNIST) in untrained and randomly pretrained networks. (b) Out-of-distribution generalization tests on transformed MNIST. (c) Out-of-distribution generalization tests on USPS dataset.
...and 21 more figures

Pretraining with Random Noise for Fast and Robust Learning without Weight Transport

TL;DR

Abstract

Pretraining with Random Noise for Fast and Robust Learning without Weight Transport

Authors

TL;DR

Abstract

Table of Contents

Figures (26)