Deep Learning without Weight Symmetry

Li Ji-An; Marcus K. Benna

Deep Learning without Weight Symmetry

Li Ji-An, Marcus K. Benna

TL;DR

The paper tackles the biological implausibility of backpropagation due to the weight transport problem and the observed lack of symmetric forward–backward connections in the brain. It introduces Product Feedback Alignment (PFA), which uses an intermediate error population and a product of fixed and plastic feedback weights to align forward and backward paths without explicit weight symmetry, rigorously analyzing path and error alignments. Theoretical results (Propositions 1–3) characterize convergence properties as a function of expansion ratio $\lambda$, and experiments show that PFA attains BP-level performance on MNIST, CIFAR-10, and ImageNet in both fully-connected and convolutional architectures, outperforming FA/DFA/SF and existing symmetry-based methods. Overall, PFA provides a biologically plausible credit-assignment mechanism with strong empirical performance, offering a potential explanation for how the brain could implement BP-like learning without precise weight symmetry.

Abstract

Backpropagation, a foundational algorithm for training artificial neural networks, predominates in contemporary deep learning. Although highly successful, it is widely considered biologically implausible, because it relies on precise symmetry between feedforward and feedback weights to accurately propagate gradient signals that assign credit. The so-called weight transport problem concerns how biological brains learn to align feedforward and feedback paths while avoiding the non-biological transport of feedforward weights into feedback weights. To address this, several credit assignment algorithms, such as feedback alignment and the Kollen-Pollack rule, have been proposed. While they can achieve the desired weight alignment, these algorithms imply that if a neuron sends a feedforward synapse to another neuron, it should also receive an identical or at least partially correlated feedback synapse from the latter neuron, thereby forming a bidirectional connection. However, this idealized connectivity pattern contradicts experimental observations in the brain, a discrepancy we refer to as the weight symmetry problem. To address this challenge posed by considering biological constraints on connectivity, we introduce the Product Feedback Alignment (PFA) algorithm. We demonstrate that PFA can eliminate explicit weight symmetry entirely while closely approximating backpropagation and achieving comparable performance in deep convolutional networks. Our results offer a novel approach to solve the longstanding problem of credit assignment in the brain, leading to more biologically plausible learning in deep networks compared to previous methods.

Deep Learning without Weight Symmetry

TL;DR

, and experiments show that PFA attains BP-level performance on MNIST, CIFAR-10, and ImageNet in both fully-connected and convolutional architectures, outperforming FA/DFA/SF and existing symmetry-based methods. Overall, PFA provides a biologically plausible credit-assignment mechanism with strong empirical performance, offering a potential explanation for how the brain could implement BP-like learning without precise weight symmetry.

Abstract

Paper Structure (19 sections, 15 equations, 5 figures, 1 table)

This paper contains 19 sections, 15 equations, 5 figures, 1 table.

Backpropagation (BP).
Feedback alignment (FA).
Direct feedback alignment (DFA).
Sign-concordant feedback (SF).
Kollen-Pollack (KP), weight mirror (WM), phaseless alignment learning (PAL).
Summary.
Proposition 1.
Proposition 2.
Proposition 3.
Appendix
Training details
MNIST
CIFAR10
ImageNet
Weight correlation/alignment in the brain
...and 4 more sections

Figures (5)

Figure 1: Comparison of biologically plausible credit assignment algorithms for multilayer networks. In the forward pass, the feedforward weight $W^{i,j}_{l+1,l}$ transmits predictions (denoted by $x$) from $S_l^j$ to $S_{l+1}^i$. In the backward pass, all algorithms use a feedback path to propagate errors (denoted by $e$). (a) In BP (backpropagation), the feedback weight is plastic and transported from the feedforward weight $W^{i,j}_{l+1,l}$. (b) In FA (feedback alignment), the feedback weight $B^{j,i}_{l,l+1}$ is kept fixed. (c) In DFA (direct feedback alignment), the fixed feedback weight $B^{j,k}_{l,L}$ projects from $S_L^k$ to $S_l^j$. The weight symmetry problem exists in the final layer. (d) In SF (sign-concordant feedback), the feedback weight is plastic with the sign transported from the feedforward weight ($\text{Sgn}\ W^{i,j}_{l+1,l}$). (e) In KP (Kollen-Pollack algorithm), WM (weight mirror), and PAL (phaseless alignment learning), the feedback weight is plastic and updated by a local plasticity rule, avoiding the transport. (f) In PFA (product feedback alignment), the error signal is propagated through an additional population $\bar{e}^k_l$. Unlike other algorithms in (a-e), PFA resolves the weight symmetry problem by avoiding the direct feedback synapse from $S_{l+1}^i$ to $S_l^j$.
Figure 2: Characterization of learning algorithms for two-hidden-layer feedforward networks trained to classify MNIST digit images. (a) Task performance. Shaded regions show standard deviations across 5 seeds. PFA and PFA-o curves are almost overlapping with the BP curve, suggesting a close approximation. (b-e) Backward-forward weight alignment for FA/SF, and path alignment for PFA/PFA-o (top). Backward-forward weight norm ratio for FA/SF, and path norm ratio for PFA/PFA-o (bottom).
Figure 3: PFA gradually approximates BP as the expansion ratio increases. (a) Eigenvalues of $B^TB$ as a function of the expansion ratio ($1/\lambda$). The shaded region shows the standard deviation of the eigenvalues. (b) Backward-forward path alignment between $W$ and $(RB)^T$ as a function of the expansion ratio ($1/\lambda$). For simulations, we randomly sampled $W$ and set $R^T=BW$ (expected to hold after the effect of the weight initialization has fully decayed). The theoretical predictions from Proposition 3 match our simulations and are consistent with the observed path alignment after training. The (invisible) shaded regions show the standard deviations of simulations.
Figure 4: Characterization of learning algorithms for ResNet-20 on CIFAR-10. (a) Task performance. Shaded regions show standard deviations across 5 seeds. PFA and PFA-o curves are almost overlapping with the BP curve, suggesting a close approximation. (b-e) Backward-forward weight alignment for FA/SF, and path alignment for PFA/PFA-o (top). Backward-forward weight norm ratio for FA/SF, and path norm ratio for PFA/PFA-o (bottom).
Figure 5: Characterization of learning algorithms for ResNet-18 on ImageNet. (a) Task performance. PFA and PFA-o curves are almost overlapping with the BP curve, suggesting a close approximation. (b-d) Backward-forward weight alignment for SF, and path alignment for PFA/PFA-o (top). Backward-forward weight norm ratio for SF, and path norm ratio for PFA/PFA-o (bottom).

Deep Learning without Weight Symmetry

TL;DR

Abstract

Deep Learning without Weight Symmetry

Authors

TL;DR

Abstract

Table of Contents

Figures (5)