Mono-Forward: Backpropagation-Free Algorithm for Efficient Neural Network Training Harnessing Local Errors

James Gong; Bruce Li; Waleed Abdulla

Mono-Forward: Backpropagation-Free Algorithm for Efficient Neural Network Training Harnessing Local Errors

James Gong, Bruce Li, Waleed Abdulla

TL;DR

The paper addresses the inefficiency and biological implausibility of backpropagation by introducing Mono-Forward (MF), a purely local, layerwise greedy training algorithm inspired by Forward-Forward. MF uses per-layer projection matrices to compute layerwise goodness $G_i=a_i M_i^\top$ and trains via cross-entropy on these scores, enabling a single forward Pass for both training and prediction with explicit label–input connections. Experimental results on MLPs and CNNs across MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 show MF matching or surpassing BP accuracy while achieving substantially lower, more stable memory usage and better parallelizability; MF’s depth-insensitive convergence and one-pass prediction (with BP-pred option) further enhance practicality. The findings suggest MF as a scalable, modular, and more biologically plausible alternative to BP, offering improved memory efficiency and hardware-friendly parallelism without sacrificing performance.

Abstract

Backpropagation is the standard method for achieving state-of-the-art accuracy in neural network training, but it often imposes high memory costs and lacks biological plausibility. In this paper, we introduce the Mono-Forward algorithm, a purely local layerwise learning method inspired by Hinton's Forward-Forward framework. Unlike backpropagation, Mono-Forward optimizes each layer solely with locally available information, eliminating the reliance on global error signals. We evaluated Mono-Forward on multi-layer perceptrons and convolutional neural networks across multiple benchmarks, including MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100. The test results show that Mono-Forward consistently matches or surpasses the accuracy of backpropagation across all tasks, with significantly reduced and more even memory usage, better parallelizability, and a comparable convergence rate.

Mono-Forward: Backpropagation-Free Algorithm for Efficient Neural Network Training Harnessing Local Errors

TL;DR

and trains via cross-entropy on these scores, enabling a single forward Pass for both training and prediction with explicit label–input connections. Experimental results on MLPs and CNNs across MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 show MF matching or surpassing BP accuracy while achieving substantially lower, more stable memory usage and better parallelizability; MF’s depth-insensitive convergence and one-pass prediction (with BP-pred option) further enhance practicality. The findings suggest MF as a scalable, modular, and more biologically plausible alternative to BP, offering improved memory efficiency and hardware-friendly parallelism without sacrificing performance.

Abstract

Paper Structure (8 sections, 8 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 8 sections, 8 equations, 5 figures, 6 tables, 1 algorithm.

Introduction
The Forward-Forward Algorithm
The Mono-Forward Algorithm
Training in MF
Prediction in MF
Parallelizability, Transparency, and Hot-Plugging
Experiments
Conclusion

Figures (5)

Figure 1: Memory Consumed during Training under BP. This experiment utilizes MNIST dataset on a network of size 5 * 2000 with batch size 30000.
Figure 2: Memory Consumed during Training under MF. This experiment utilizes MNIST dataset on a network of size 5 * 2000 with batch size 30000.
Figure 3: Memory Comparison between Backpropagation (BP) and Mono-Forward (MF) during Training. For the MLP architecture, the experiment involves varying numbers of 1000 ReLU neuron layers on the MNIST dataset using a full batch size. For the CNN architecture, the setup includes different number of convolutional layers with 64, 128, 256, 512, and 512 neurons on the CIFAR-10 dataset with a batch size of 256. The peak memory usage during training is recorded.
Figure 4: Comparison of Convergence Rates between MF and BP. The experiments were conducted using a MLP network with $4 \times 1000$ neurons, optimized using the Adam optimizer with a learning rate of 0.001. Results are averaged over at least 10 different random seeds for each dataset.
Figure 5: Comparison of Convergence Rates between MF and BP. The experiments were conducted using a MLP network with $n \times 1000$ neurons, optimized using the Adam optimizer with a learning rate of 0.001, with batch size 256. MF-15 means a 15 layers MLP network trained with MF.

Mono-Forward: Backpropagation-Free Algorithm for Efficient Neural Network Training Harnessing Local Errors

TL;DR

Abstract

Mono-Forward: Backpropagation-Free Algorithm for Efficient Neural Network Training Harnessing Local Errors

Authors

TL;DR

Abstract

Table of Contents

Figures (5)