Table of Contents
Fetching ...

The Forward-Forward Algorithm: Some Preliminary Investigations

Geoffrey Hinton

TL;DR

The paper introduces the Forward-Forward algorithm (FF), a two-forward-pass learning paradigm that replaces backpropagation’s forward-backward scheme with positive and negative data phases and local layer-wise goodness functions. FF addresses biological plausibility and online learning concerns, enabling derivative-free updates and compatible operation with streaming data, while maintaining competitive performance on MNIST and CIFAR-10 with relatively small networks. It situates FF within the landscape of contrastive learning, linking it to Boltzmann Machines, GANs, and recent self-supervised methods, and discusses hardware implications including analog implementations and the concept of mortal computation. The work presents empirical results, architectural insights (e.g., layer normalization to prevent shortcut solutions), and a roadmap of open questions and future directions for scaling, activations, and generative capabilities.

Abstract

The aim of this paper is to introduce a new learning procedure for neural networks and to demonstrate that it works well enough on a few small problems to be worth further investigation. The Forward-Forward algorithm replaces the forward and backward passes of backpropagation by two forward passes, one with positive (i.e. real) data and the other with negative data which could be generated by the network itself. Each layer has its own objective function which is simply to have high goodness for positive data and low goodness for negative data. The sum of the squared activities in a layer can be used as the goodness but there are many other possibilities, including minus the sum of the squared activities. If the positive and negative passes could be separated in time, the negative passes could be done offline, which would make the learning much simpler in the positive pass and allow video to be pipelined through the network without ever storing activities or stopping to propagate derivatives.

The Forward-Forward Algorithm: Some Preliminary Investigations

TL;DR

The paper introduces the Forward-Forward algorithm (FF), a two-forward-pass learning paradigm that replaces backpropagation’s forward-backward scheme with positive and negative data phases and local layer-wise goodness functions. FF addresses biological plausibility and online learning concerns, enabling derivative-free updates and compatible operation with streaming data, while maintaining competitive performance on MNIST and CIFAR-10 with relatively small networks. It situates FF within the landscape of contrastive learning, linking it to Boltzmann Machines, GANs, and recent self-supervised methods, and discusses hardware implications including analog implementations and the concept of mortal computation. The work presents empirical results, architectural insights (e.g., layer normalization to prevent shortcut solutions), and a roadmap of open questions and future directions for scaling, activations, and generative capabilities.

Abstract

The aim of this paper is to introduce a new learning procedure for neural networks and to demonstrate that it works well enough on a few small problems to be worth further investigation. The Forward-Forward algorithm replaces the forward and backward passes of backpropagation by two forward passes, one with positive (i.e. real) data and the other with negative data which could be generated by the network itself. Each layer has its own objective function which is simply to have high goodness for positive data and low goodness for negative data. The sum of the squared activities in a layer can be used as the goodness but there are many other possibilities, including minus the sum of the squared activities. If the positive and negative passes could be separated in time, the negative passes could be done offline, which would make the learning much simpler in the positive pass and allow video to be pipelined through the network without ever storing activities or stopping to propagate derivatives.
Paper Structure (21 sections, 4 equations, 3 figures, 1 table)