Going Forward-Forward in Distributed Deep Learning
Ege Aktemur, Ege Zorlutuna, Kaan Bilgili, Tacettin Emre Bok, Berrin Yanikoglu, Suha Orhun Mutluergil
TL;DR
The paper tackles the high cost of training deep networks in distributed environments by adopting Geoffrey Hinton's Forward-Forward algorithm and wrapping it in Pipeline Forward-Forward (PFF) to enable distributed, layer-wise updates with reduced communication. It introduces several PFF variants (Single-Layer, All-Layers, Federated) and a Supervised variant with a new goodness function, plus classification strategies (Goodness and Softmax). Experiments on MNIST and CIFAR-10 show substantial speedups (e.g., ~3.75x on MNIST and up to ~5x in some configurations) while maintaining competitive accuracy, and demonstrate that PFF can outperform the distributed FF baseline (DFF) in both time and accuracy under certain settings. The work highlights the practical significance of FF-based distributed training as a scalable, lower-communication alternative to backpropagation for large neural networks, with avenues for future improvements and broader adoption in federated and multi-GPU contexts.
Abstract
We introduce a new approach in distributed deep learning, utilizing Geoffrey Hinton's Forward-Forward (FF) algorithm to speed up the training of neural networks in distributed computing environments. Unlike traditional methods that rely on forward and backward passes, the FF algorithm employs a dual forward pass strategy, significantly diverging from the conventional backpropagation process. This novel method aligns more closely with the human brain's processing mechanisms, potentially offering a more efficient and biologically plausible approach to neural network training. Our research explores different implementations of the FF algorithm in distributed settings, to explore its capacity for parallelization. While the original FF algorithm focused on its ability to match the performance of the backpropagation algorithm, the parallelism aims to reduce training times and resource consumption, thereby addressing the long training times associated with the training of deep neural networks. Our evaluation shows a 3.75 times speed up on MNIST dataset without compromising accuracy when training a four-layer network with four compute nodes. The integration of the FF algorithm into distributed deep learning represents a significant step forward in the field, potentially revolutionizing the way neural networks are trained in distributed environments.
