Table of Contents
Fetching ...

Feed-Forward Optimization With Delayed Feedback for Neural Network Training

Katharina Flügel, Daniel Coquelin, Marie Weiel, Charlotte Debus, Achim Streit, Markus Götz

TL;DR

The paper tackles the biological implausibility of backpropagation due to weight transport and update locking. It introduces Feed-Forward with delayed Feedback (F3), a backpropagation-free training method that uses fixed random feedback weights and delayed error signals to compute layer updates as $\delta h_i = B_i^T e^{t-1}$, enabling forward-pass updates. The authors prove that F3 updates descend the loss under a standard framework, analyze its time/memory complexity, and validate it against bio-plausible alternatives across MNIST, SGEMM, Wine Quality, and a Transformer variant, showing substantial reductions in the performance gap to BP (up to 56% for classification and 96% for regression). The work demonstrates that F3 can offer biologically plausible and energy-efficient training with potential for parallelization and on-device/neuromorphic deployment, while maintaining competitive predictive performance.

Abstract

Backpropagation has long been criticized for being biologically implausible due to its reliance on concepts that are not viable in natural learning processes. Two core issues are the weight transport and update locking problems caused by the forward-backward dependencies, which limit biological plausibility, computational efficiency, and parallelization. Although several alternatives have been proposed to increase biological plausibility, they often come at the cost of reduced predictive performance. This paper proposes an alternative approach to training feed-forward neural networks addressing these issues by using approximate gradient information. We introduce Feed-Forward with delayed Feedback (F$^3$), which approximates gradients using fixed random feedback paths and delayed error information from the previous epoch to balance biological plausibility with predictive performance. We evaluate F$^3$ across multiple tasks and architectures, including both fully-connected and Transformer networks. Our results demonstrate that, compared to similarly plausible approaches, F$^3$ significantly improves predictive performance, narrowing the gap to backpropagation by up to 56% for classification and 96% for regression. This work is a step towards more biologically plausible learning algorithms while opening up new avenues for energy-efficient and parallelizable neural network training.

Feed-Forward Optimization With Delayed Feedback for Neural Network Training

TL;DR

The paper tackles the biological implausibility of backpropagation due to weight transport and update locking. It introduces Feed-Forward with delayed Feedback (F3), a backpropagation-free training method that uses fixed random feedback weights and delayed error signals to compute layer updates as , enabling forward-pass updates. The authors prove that F3 updates descend the loss under a standard framework, analyze its time/memory complexity, and validate it against bio-plausible alternatives across MNIST, SGEMM, Wine Quality, and a Transformer variant, showing substantial reductions in the performance gap to BP (up to 56% for classification and 96% for regression). The work demonstrates that F3 can offer biologically plausible and energy-efficient training with potential for parallelization and on-device/neuromorphic deployment, while maintaining competitive predictive performance.

Abstract

Backpropagation has long been criticized for being biologically implausible due to its reliance on concepts that are not viable in natural learning processes. Two core issues are the weight transport and update locking problems caused by the forward-backward dependencies, which limit biological plausibility, computational efficiency, and parallelization. Although several alternatives have been proposed to increase biological plausibility, they often come at the cost of reduced predictive performance. This paper proposes an alternative approach to training feed-forward neural networks addressing these issues by using approximate gradient information. We introduce Feed-Forward with delayed Feedback (F), which approximates gradients using fixed random feedback paths and delayed error information from the previous epoch to balance biological plausibility with predictive performance. We evaluate F across multiple tasks and architectures, including both fully-connected and Transformer networks. Our results demonstrate that, compared to similarly plausible approaches, F significantly improves predictive performance, narrowing the gap to backpropagation by up to 56% for classification and 96% for regression. This work is a step towards more biologically plausible learning algorithms while opening up new avenues for energy-efficient and parallelizable neural network training.
Paper Structure (6 sections, 5 theorems, 5 equations, 1 figure, 1 algorithm)

This paper contains 6 sections, 5 theorems, 5 equations, 1 figure, 1 algorithm.

Key Result

theorem thmcountertheorem

Given two subsequent hidden layers $i$ and $i+1$ in a feed-forward neural network. Let $c_i$ be the backpropagated gradients, $\delta h_j, j\in\{i,i+1\}$ be non-zero update directions prescribed by the feedback paths, and $\frac{\delta h_j}{\|\delta h_j\|}$ be constant for each data point. If $L_i=\

Figures (1)

  • Figure 1: F3 (a) solves both the weight transport and the update locking problems. In contrast to prior approaches, it uses delayed error information in the updates. The current error signal $e^t$ in epoch $t$ is stored (green) and used in the forward pass (blue) of the next epoch $t+1$, eliminating the backward pass (red) for all hidden layers. Previous approaches (b) to (e): Backpropagation (BP) (b) is not biologically plausible due to the weight transport and update locking problems. DFA nokland_direct_2016 (c) solves the weight transport problem by replacing the backward paths with direct random feedback paths $B_i$ but is still update-locked as it depends on the error $e$. DRTP frenkel_learning_2021 (d) releases update locking by using the target $y^*$ instead of the error $e$, but this comes at the cost of reduced accuracy. PEPITA dellaferrera_error-driven_2022 (e) improves the accuracy by using two forward passes per sample but is only partially update-unlocked.

Theorems & Definitions (7)

  • theorem thmcountertheorem
  • theorem thmcountertheorem
  • lemma thmcounterlemma
  • lemma thmcounterlemma
  • proof
  • lemma thmcounterlemma
  • proof