Table of Contents
Fetching ...

One-Time Soft Alignment Enables Resilient Learning without Weight Transport

Jeonghwan Cheon, Jaehyuk Bae, Se-Bum Paik

TL;DR

The paper addresses the inefficiency and biological implausibility of backpropagation by proposing a one-time soft alignment between forward and backward weights at initialization. This initial alignment enables deep networks to learn without weight transport, achieving performance close to standard backpropagation and surpassing traditional feedback alignment in stability and generalization. Through spectral analyses, the authors show that IFA guides the optimization toward smoother, flatter minima, which improves robustness to input corruptions and under adversarial perturbations. The approach offers a simple, hardware-friendly alternative with potential benefits for energy-efficient learning and neuromorphic implementations, while acknowledging remaining gaps for very deep or complex models.

Abstract

Backpropagation is the cornerstone of deep learning, but its reliance on symmetric weight transport and global synchronization makes it computationally expensive and biologically implausible. Feedback alignment offers a promising alternative by approximating error gradients through fixed random feedback, thereby avoiding symmetric weight transport. However, this approach often struggles with poor learning performance and instability, especially in deep networks. Here, we show that a one-time soft alignment between forward and feedback weights at initialization enables deep networks to achieve performance comparable to backpropagation, without requiring weight transport during learning. This simple initialization condition guides stable error minimization in the loss landscape, improving network trainability. Spectral analyses further reveal that initial alignment promotes smoother gradient flow and convergence to flatter minima, resulting in better generalization and robustness. Notably, we also find that allowing moderate deviations from exact weight symmetry can improve adversarial robustness compared to standard backpropagation. These findings demonstrate that a simple initialization strategy can enable effective learning in deep networks in a biologically plausible and resource-efficient manner.

One-Time Soft Alignment Enables Resilient Learning without Weight Transport

TL;DR

The paper addresses the inefficiency and biological implausibility of backpropagation by proposing a one-time soft alignment between forward and backward weights at initialization. This initial alignment enables deep networks to learn without weight transport, achieving performance close to standard backpropagation and surpassing traditional feedback alignment in stability and generalization. Through spectral analyses, the authors show that IFA guides the optimization toward smoother, flatter minima, which improves robustness to input corruptions and under adversarial perturbations. The approach offers a simple, hardware-friendly alternative with potential benefits for energy-efficient learning and neuromorphic implementations, while acknowledging remaining gaps for very deep or complex models.

Abstract

Backpropagation is the cornerstone of deep learning, but its reliance on symmetric weight transport and global synchronization makes it computationally expensive and biologically implausible. Feedback alignment offers a promising alternative by approximating error gradients through fixed random feedback, thereby avoiding symmetric weight transport. However, this approach often struggles with poor learning performance and instability, especially in deep networks. Here, we show that a one-time soft alignment between forward and feedback weights at initialization enables deep networks to achieve performance comparable to backpropagation, without requiring weight transport during learning. This simple initialization condition guides stable error minimization in the loss landscape, improving network trainability. Spectral analyses further reveal that initial alignment promotes smoother gradient flow and convergence to flatter minima, resulting in better generalization and robustness. Notably, we also find that allowing moderate deviations from exact weight symmetry can improve adversarial robustness compared to standard backpropagation. These findings demonstrate that a simple initialization strategy can enable effective learning in deep networks in a biologically plausible and resource-efficient manner.

Paper Structure

This paper contains 46 sections, 16 equations, 20 figures, 2 tables, 1 algorithm.

Figures (20)

  • Figure 1: Effect of initial weight alignment on learning without weight transport. (a) Schematic of the backpropagation (BP) algorithm, where forward and backward weights are kept synchronized during training through external memory access. (b) Schematic of feedback alignment (FA) algorithms, which use a separate backward pathway with fixed, random feedback weights. Initial feedback alignment (IFA) also uses fixed feedback weights but aligns them with the forward weights only at initialization. (c) Computation flow for implementing backpropagation and feedback alignment on‐chip. (d) Comparison of memory access cost between FA and BP. (e) Neural network models trained on the CIFAR-10 dataset using various learning algorithms. (f) Alignment angle between forward and backward weights in the final layer of the network. (g) Learning curves showing test accuracy over the course of training. (h) Final test accuracy.
  • Figure 2: Loss landscape analysis of learning trajectories under different training rules. (a) Illustration of neural networks traversing the loss landscape during training, with each trajectory corresponding to a different learning algorithm. (b-d) Training trajectories projected onto a two-dimensional subspace defined by the first two principal components (PCA): (b) BP, (c) FA, (d) IFA. (e) Geometric properties of the loss landscape, quantified using the Hessian’s spectrum; smoother landscapes exhibit narrower spectra (i.e., smaller spectral radius). (f-h) Trace of the Hessian over the course of training. (i-k) Maximum eigenvalue of the Hessian during training for BP, FA, and IFA, respectively.
  • Figure 3: Trainability of neural networks under various conditions. (a) Trainable parameter space: the x- and y-axes represent the variances of the forward and backward weights, respectively. Color indicates the final training accuracy. The point $(1, 1)$ corresponds to LeCun initialization; $(\sqrt{2}, \sqrt{2})$ corresponds to He initialization. (b) Trainability versus network depth: final training accuracy for feedforward networks with depths ranging from 2 to 10 layers. (c) Trainability under limited data: final training accuracy as a function of training set sizes, ranging from 100 to 50,000 samples.
  • Figure 4: Effect of the degree of initial alignment on network performance. (a) Schematic of initialization conditions for forward and backward weights in feedback alignment. Perfect initial alignment sets the backward weights equal to the forward weights at initialization, whereas baseline feedback alignment uses randomly initialized backward weights. Soft alignment is achieved by sampling forward weights within a subspace defined by a specified angle relative to the backward weights. (b) Angle between forward and backward weights over the course of training. (c) Learning curves showing test accuracy during training. (d) Relationship between the initial alignment angle and final test accuracy.
  • Figure 5: Spectral analysis of convergence across initial alignment angles. (a) Loss landscape visualized by perturbing the trained model parameters within a two-dimensional subspace defined by the top two eigenvectors of the Hessian. The axes, denoted by $\alpha$ and $\beta$, represent scaling factors for perturbations along the first and second eigenvector directions, respectively. The landscapes are color-coded according to each network’s initial alignment angle. (b) Spectral density of the Hessian eigenvalues, illustrating the distribution of curvature across the parameter space. (c) Relationship between the initial alignment angle and the trace of the Hessian. (d) Relationship between the initial alignment angle and the largest Hessian eigenvalue.
  • ...and 15 more figures