Table of Contents
Fetching ...

Training Large Neural Networks With Low-Dimensional Error Feedback

Maher Hanut, Jonathan Kadmon

TL;DR

This work shows that training deep networks need not transport full gradient information; a learned, low-dimensional teaching signal can suffice for effective credit assignment when projected into the task-relevant subspace. The authors develop low-dimensional feedback alignment (LDFA), combining a rank-$r$ backward pathway $B=QP$ with either normative or local subspace learning rules, and demonstrate near-backpropagation performance across linear models, CNNs, and vision transformers on CIFAR-10/100. They show that error dimensionality primarily tracks task dimensionality $d$, enabling substantial backward-pass compute savings while preserving accuracy, and that the dimensionality of the error channel shapes early representations, offering biological plausibility and new inductive biases for learning systems. The results imply a principled rethinking of gradient-based learning in high-dimensional systems and point toward practical, brain-inspired approaches for efficient training and representation learning.

Abstract

Training deep neural networks typically relies on backpropagating high dimensional error signals a computationally intensive process with little evidence supporting its implementation in the brain. However, since most tasks involve low-dimensional outputs, we propose that low-dimensional error signals may suffice for effective learning. To test this hypothesis, we introduce a novel local learning rule based on Feedback Alignment that leverages indirect, low-dimensional error feedback to train large networks. Our method decouples the backward pass from the forward pass, enabling precise control over error signal dimensionality while maintaining high-dimensional representations. We begin with a detailed theoretical derivation for linear networks, which forms the foundation of our learning framework, and extend our approach to nonlinear, convolutional, and transformer architectures. Remarkably, we demonstrate that even minimal error dimensionality on the order of the task dimensionality can achieve performance matching that of traditional backpropagation. Furthermore, our rule enables efficient training of convolutional networks, which have previously been resistant to Feedback Alignment methods, with minimal error. This breakthrough not only paves the way toward more biologically accurate models of learning but also challenges the conventional reliance on high-dimensional gradient signals in neural network training. Our findings suggest that low-dimensional error signals can be as effective as high-dimensional ones, prompting a reevaluation of gradient-based learning in high-dimensional systems. Ultimately, our work offers a fresh perspective on neural network optimization and contributes to understanding learning mechanisms in both artificial and biological systems.

Training Large Neural Networks With Low-Dimensional Error Feedback

TL;DR

This work shows that training deep networks need not transport full gradient information; a learned, low-dimensional teaching signal can suffice for effective credit assignment when projected into the task-relevant subspace. The authors develop low-dimensional feedback alignment (LDFA), combining a rank- backward pathway with either normative or local subspace learning rules, and demonstrate near-backpropagation performance across linear models, CNNs, and vision transformers on CIFAR-10/100. They show that error dimensionality primarily tracks task dimensionality , enabling substantial backward-pass compute savings while preserving accuracy, and that the dimensionality of the error channel shapes early representations, offering biological plausibility and new inductive biases for learning systems. The results imply a principled rethinking of gradient-based learning in high-dimensional systems and point toward practical, brain-inspired approaches for efficient training and representation learning.

Abstract

Training deep neural networks typically relies on backpropagating high dimensional error signals a computationally intensive process with little evidence supporting its implementation in the brain. However, since most tasks involve low-dimensional outputs, we propose that low-dimensional error signals may suffice for effective learning. To test this hypothesis, we introduce a novel local learning rule based on Feedback Alignment that leverages indirect, low-dimensional error feedback to train large networks. Our method decouples the backward pass from the forward pass, enabling precise control over error signal dimensionality while maintaining high-dimensional representations. We begin with a detailed theoretical derivation for linear networks, which forms the foundation of our learning framework, and extend our approach to nonlinear, convolutional, and transformer architectures. Remarkably, we demonstrate that even minimal error dimensionality on the order of the task dimensionality can achieve performance matching that of traditional backpropagation. Furthermore, our rule enables efficient training of convolutional networks, which have previously been resistant to Feedback Alignment methods, with minimal error. This breakthrough not only paves the way toward more biologically accurate models of learning but also challenges the conventional reliance on high-dimensional gradient signals in neural network training. Our findings suggest that low-dimensional error signals can be as effective as high-dimensional ones, prompting a reevaluation of gradient-based learning in high-dimensional systems. Ultimately, our work offers a fresh perspective on neural network optimization and contributes to understanding learning mechanisms in both artificial and biological systems.

Paper Structure

This paper contains 43 sections, 58 equations, 5 figures, 3 tables, 2 algorithms.

Figures (5)

  • Figure 1: Illustration of different approaches for propagating error to hidden layers. From left to right: Backpropagation (BP) propagates error using the exact transposes of the forward weights. Feedback alignment (FA) replaces $W_l^\top$ with fixed random feedback matrices and relies on the forward weights aligning with this backward pathway during training. Low-dimensional feedback alignment (LDFA) constrains each feedback map to be low rank (e.g., $B_l=Q_lP_l$ with rank $r$), so each layer receives an $r$-dimensional teaching signal whose subspace is learned. Direct LDFA (dLDFA) extends LDFA by routing low-dimensional error projections to non-adjacent layers, allowing early layers to receive teaching signals directly from the output (or other higher) layers.
  • Figure 2: Learning dynamics and component alignment in linear networks.(a) Schematic of the network architecture with input dimension $n = 128$, hidden layer size $k = 64$, and output dimension $m = 64$. The feedback matrix $B$ is factorized as $B = QP$ and constrained to rank $r$. (b, c) Comparison of theoretical predictions (dashed) and numerical simulations (solid) for low-rank Feedback Alignment (FA), updating $Q$ with $P$ fixed akrout2019deep, with $r = m = 64$ and $r = 8 < m$, respectively. The $y$-axis reports the normalized mode overlap $\Lambda_i = u_i^\top W_2 W_1 v_i / S_{ii}$ for each singular component $i$. (d, e) Same as (b, c) but training both $Q$ and $P$ using Eqs. \ref{['eq:BackwardUpdates']}. (f) With $P$ fixed, overlaps increase on average (bold), but the leading singular components do not reliably reach $\Lambda_i \approx 1$ at low rank. (g) Training $P$ aligns the feedback subspace with the evolving error, yielding near-complete recovery of the top-$r$ components and improved convergence.
  • Figure 3: Low-dimensional feedback in nonlinear vision architectures.(a) Convolutional networks. A VGG-like CNN trained on CIFAR-10 with learned low-rank feedback achieves near-backpropagation (BP) test accuracy even when the backward/teaching channel is strongly compressed. Bars sphow final accuracy for BP and for low-dimensional feedback with feedback rank set to a fraction of the layer width ($r=n/2,\,n/4,\,n/8$, where $n$ denotes the number of channels in the corresponding block). (b) Vision transformers (ViT). Top, schematic of a transformer block and multi-head self-attention with low-rank feedback channels; low rank feedback used to train all weights, including MLP and attention weights. Bottom, training curves (test accuracy versus training step) for BP (black) and low-dimensional feedback with a shared feedback rank $r$ across all linear maps (color-coded); inset shows the corresponding training loss. (c) Scaling and efficiency in ViT. Top, holding the feedback rank fixed ($r=36$) while reducing the embedding dimension decreases accuracy, indicating that performance continues to benefit from larger feedforward representations even under a fixed low-dimensional error channel. Bottom, ViT test accuracy (bars) and total compute required to reach $90\%$ of the final accuracy (line, TFLOPs) as a function of feedback rank, revealing an intermediate-rank regime that preserves BP-level accuracy while reducing end-to-end training compute (about $20\%$ in these experiments).
  • Figure 4: Low-dimensional feedback alignment (LDFA) as a synaptically local error pathway.(a) Linear-network analysis ($n=128$, $k=64$, $m=64$) with factorized feedback $B=QP$. Solid curves show simulations and dashed curves show the analytical prediction for the evolution of mode-wise singular values. With full-rank feedback ($r=m$), all task modes are recovered, whereas with rank-constrained feedback ($r<m$) learning prioritizes the dominant task modes selected by the Oja dynamics. (b) Multilayer perceptrons on CIFAR. Top: separating feedforward capacity from feedback dimensionality. A wide MLP reaches high accuracy even when the teaching signal is strongly compressed (LDFA with low rank $r$), whereas a much narrower MLP performs substantially worse despite having an unconstrained (full-rank) error pathway (BP). Bottom: varying task output dimensionality by subsampling classes from CIFAR-100 shows that the minimal rank required to match BP scales primarily with task dimension (number of classes), not with hidden-layer width (all layers use the same rank $r$). (c) Direct LDFA implements direct error projection by broadcasting a low-dimensional teaching signal to each layer either from the output layer (solid) or the penultimate layer (dashed), approaching BP-level accuracy as rank increases. (d) Convolutional networks trained with LDFA. A 4-block VGG-style CNN attains BP-level performance with low backward rank even when the widest blocks contain 512 channels. Inset: constraining all layers to a fixed fraction of their width ($r=n/2,n/4,n/8$) yields graceful degradation.
  • Figure 5: Receptive fields are shaped by feedback dimensionality.(a) Simplified model of the early ventral visual stream (retina $\rightarrow$ VVS) adapted from lindsey2019unified, and two manipulations that decouple feedforward bottlenecks from the teaching pathway. Top: an anatomical bottleneck limits the retinal output. Middle: the feedforward architecture is left full-width, but the teaching signal to the retinal layer is restricted by a low-rank feedback map (LDFA). Bottom: the feedforward bottleneck is reinstated, while the retinal layer receives a high-dimensional teaching signal via direct error projection (dLDFA; Eq. (16)), bypassing backward compression. (b) Backpropagation reproduces the Lindsey et al. phenomenology: removing the retinal bottleneck yields oriented, edge-like receptive fields, whereas a narrow retinal bottleneck yields center-surround receptive fields. (c) With the feedforward pathway unconstrained but feedback to the retina rank-limited, LDFA reliably produces center-surround receptive fields; decreasing the feedback rank $r$ (examples $r=32,4,2$) increases their effective rotational symmetry. (d) Conversely, with a feedforward retinal bottleneck but high-dimensional error delivered directly to the retina using direct LDFA (dLDFA), receptive fields become oriented. Together, these manipulations show that the dimensionality and routing of the teaching signal can reproduce, and even override, receptive-field structure previously attributed to feedforward anatomical constraints.