Emergent representations in networks trained with the Forward-Forward algorithm

Niccolò Tosato; Lorenzo Basile; Emanuele Ballarin; Giuseppe de Alteriis; Alberto Cazzaniga; Alessio Ansuini

Emergent representations in networks trained with the Forward-Forward algorithm

Niccolò Tosato, Lorenzo Basile, Emanuele Ballarin, Giuseppe de Alteriis, Alberto Cazzaniga, Alessio Ansuini

TL;DR

Backpropagation has been criticised for its biological implausibility, motivating exploration of Forward-Forward (FF) as a more plausible alternative. This study analyzes the internal representations learned by FF across several datasets and compares FF with Backpropagation-on-the-same-goodness (BP/FF) and standard Backprop (BP). The key finding is that FF naturally yields sparse, category-specific ensembles—small groups of co-activating units—that can also arise for unseen categories and can be shared across visually related classes; interestingly, similar sparsity can emerge when training with Backprop on the same objective. The work highlights potential connections between lightweight, biologically inspired learning rules and efficient neural coding, with implications for zero-shot classification and model compression.

Abstract

The Backpropagation algorithm has often been criticised for its lack of biological realism. In an attempt to find a more biologically plausible alternative, the recently introduced Forward-Forward algorithm replaces the forward and backward passes of Backpropagation with two forward passes. In this work, we show that the internal representations obtained by the Forward-Forward algorithm can organise into category-specific ensembles exhibiting high sparsity -- composed of a low number of active units. This situation is reminiscent of what has been observed in cortical sensory areas, where neuronal ensembles are suggested to serve as the functional building blocks for perception and action. Interestingly, while this sparse pattern does not typically arise in models trained with standard Backpropagation, it can emerge in networks trained with Backpropagation on the same objective proposed for the Forward-Forward algorithm.

Emergent representations in networks trained with the Forward-Forward algorithm

TL;DR

Abstract

Paper Structure (32 sections, 2 equations, 14 figures, 13 tables)

This paper contains 32 sections, 2 equations, 14 figures, 13 tables.

Introduction
Related Work
Forward-Forward
Neuronal ensembles
Methods
Data
Model trained with Forward-Forward (FF)
Model trained with Backpropagation on the goodness objective (BP/FF)
Model trained with Backpropagation on the cross-entropy loss (BP)
Analysis of representations
Results
Classification accuracy
Forward-Forward elicits sparse neuronal ensembles
Visually similar classes can elicit ensembles with shared neurons
Representations of unseen categories can elicit well-defined ensembles
...and 17 more sections

Figures (14)

Figure 1: Activation patterns in a Multi-Layer Perceptron trained with the Forward-Forward algorithm, on the Mnist dataset. Panel A Examples of activation patterns in response to a positive input (class label embedded as a one-hot encoding on the top left corner of the image). Images show the activation value for network units, arranged as a matrix only for the sake of clarity; darker squares represent more active neurons. Panel B Activation value of each neuron in the first hidden layer (Layer 1), averaged on all images of a given class. Neuron index on the $x$ axis; average activation on the $y$ axis. Blue dots indicate units that are considered active according to the leave-one-out (LOO) method described in \ref{['ssec:methods_analysis_reps']}. Panel C Activation map for neurons in Layer 1 for all images, grouped by class. A blue dot in position $(x,y)$ indicates that neuron $x$ is activated by input $y$; colour scale represents the intensity of such activation. Horizontal bands mark different categories; blue vertical stripes mark active, category-specific neurons. Each input category activates consistently a specific sets of neurons (ensemble).
Figure 2: Sparsity of category-specific representations. We report the sparsity of representations - computed as described in \ref{['ssec:methods_analysis_reps']} - for the three models FF, BP/FF and BP on the Mnist dataset. Sparsity values are the average over 10 runs.
Figure 3: Visually similar classes in FashionMnist can elicit ensembles with shared neurons. Panel A The ensembles elicited in the first hidden layer of FF by two example inputs. Red circles indicate the active units which are shared between the two categories. Panel B Element $i,j$ of the matrix indicates how many units are shared between the ensembles of category $i$ and category $j$ (normalised by the ensemble sizes), by using the Jaccard similarity index: $J(\mathcal{E}^i, \mathcal{E}^{j}) = \frac{\mid \mathcal{E}^i \cap \mathcal{E}^{j} \mid}{\mid \mathcal{E}^i \cup \mathcal{E}^{j} \mid}$. The results are referred to a single training run.
Figure 4: The representations of an unseen category form an ensemble in FF trained on FashionMnist. Panel A Activation patterns in response to the different categories in the first hidden layer. The unseen category (Sandal), surrounded by red lines, produces a relatively weaker but well-defined ensemble-like activation pattern. Panel B Activation value of each neuron, averaged on all images of the unseen category. Neuron index on the $x$ axis; average activation on the $y$ axis. Blue dots indicate units that are considered active according to the method described in \ref{['ssec:methods_analysis_reps']}. Panel C Ensembles of unseen categories can share units with the ensembles of the other categories. Element $i,j$ of the matrix indicates how many units are shared between the ensembles of category $i$ and category $j$: $\mid \mathcal{E}^i \cap \mathcal{E}^{j} \mid$. The results are referred to a single training run.
Figure 5: Distribution of $\varrho_i^{+}$ in Layer 2 (Mnist dataset). In FF, the distribution is imbalanced, with most of the population of neurons having $\approx 65 - 75 \%$ of excitatory weights. In BP/FF the distribution is bimodal with two populations of neurons: one inmbalanced towards excitation (right mode) and the other towards inhibition (left mode). The BP model is almost perfectly balanced between excitation and inhibition.
...and 9 more figures

Emergent representations in networks trained with the Forward-Forward algorithm

TL;DR

Abstract

Emergent representations in networks trained with the Forward-Forward algorithm

Authors

TL;DR

Abstract

Table of Contents

Figures (14)