Feed-Forward Optimization With Delayed Feedback for Neural Network Training
Katharina Flügel, Daniel Coquelin, Marie Weiel, Charlotte Debus, Achim Streit, Markus Götz
TL;DR
The paper tackles the biological implausibility of backpropagation due to weight transport and update locking. It introduces Feed-Forward with delayed Feedback (F3), a backpropagation-free training method that uses fixed random feedback weights and delayed error signals to compute layer updates as $\delta h_i = B_i^T e^{t-1}$, enabling forward-pass updates. The authors prove that F3 updates descend the loss under a standard framework, analyze its time/memory complexity, and validate it against bio-plausible alternatives across MNIST, SGEMM, Wine Quality, and a Transformer variant, showing substantial reductions in the performance gap to BP (up to 56% for classification and 96% for regression). The work demonstrates that F3 can offer biologically plausible and energy-efficient training with potential for parallelization and on-device/neuromorphic deployment, while maintaining competitive predictive performance.
Abstract
Backpropagation has long been criticized for being biologically implausible due to its reliance on concepts that are not viable in natural learning processes. Two core issues are the weight transport and update locking problems caused by the forward-backward dependencies, which limit biological plausibility, computational efficiency, and parallelization. Although several alternatives have been proposed to increase biological plausibility, they often come at the cost of reduced predictive performance. This paper proposes an alternative approach to training feed-forward neural networks addressing these issues by using approximate gradient information. We introduce Feed-Forward with delayed Feedback (F$^3$), which approximates gradients using fixed random feedback paths and delayed error information from the previous epoch to balance biological plausibility with predictive performance. We evaluate F$^3$ across multiple tasks and architectures, including both fully-connected and Transformer networks. Our results demonstrate that, compared to similarly plausible approaches, F$^3$ significantly improves predictive performance, narrowing the gap to backpropagation by up to 56% for classification and 96% for regression. This work is a step towards more biologically plausible learning algorithms while opening up new avenues for energy-efficient and parallelizable neural network training.
