On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization

Erik B. Terres-Escudero; Javier Del Ser; Pablo Garcia-Bringas

On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization

Erik B. Terres-Escudero, Javier Del Ser, Pablo Garcia-Bringas

TL;DR

Polar-FFA is proposed, a novel implementation of the FFA algorithm, denoted as Polar-FFA, which extends the original formulation by introducing a neural division between positive and negative instances, thereby allowing for a broader range of neural network configurations.

Abstract

Forward-only learning algorithms have recently gained attention as alternatives to gradient backpropagation, replacing the backward step of this latter solver with an additional contrastive forward pass. Among these approaches, the so-called Forward-Forward Algorithm (FFA) has been shown to achieve competitive levels of performance in terms of generalization and complexity. Networks trained using FFA learn to contrastively maximize a layer-wise defined goodness score when presented with real data (denoted as positive samples) and to minimize it when processing synthetic data (corr. negative samples). However, this algorithm still faces weaknesses that negatively affect the model accuracy and training stability, primarily due to a gradient imbalance between positive and negative samples. To overcome this issue, in this work we propose a novel implementation of the FFA algorithm, denoted as Polar-FFA, which extends the original formulation by introducing a neural division (\emph{polarization}) between positive and negative instances. Neurons in each of these groups aim to maximize their goodness when presented with their respective data type, thereby creating a symmetric gradient behavior. To empirically gauge the improved learning capabilities of our proposed Polar-FFA, we perform several systematic experiments using different activation and goodness functions over image classification datasets. Our results demonstrate that Polar-FFA outperforms FFA in terms of accuracy and convergence speed. Furthermore, its lower reliance on hyperparameters reduces the need for hyperparameter tuning to guarantee optimal generalization capabilities, thereby allowing for a broader range of neural network configurations.

On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization

TL;DR

Abstract

Paper Structure (28 sections, 21 equations, 12 figures, 9 tables)

This paper contains 28 sections, 21 equations, 12 figures, 9 tables.

Related Work
Forward-only Learning
Forward-Forward Algorithm
Proposed Polar Forward-Forward Algorithm
Characterization of FFA and Polar-FFA
Layer normalization
Definition of probability function
Experimental Setup
Results and Discussion
Conclusions and Future Research Lines
Instability of Sigmoidal Probability Functions
Proof of Proposition 1
Proof of Proposition 2
Experimental Setup: Additional Information
Hyperparameter values
...and 13 more sections

Figures (12)

Figure 1: Forward and backward propagation's paths on (a) Backpropagation (BP); (b) Forward-Forward Algorithm (FFA); and (c) Polar-FFA. Black lines denote the forward direction of the information flowing from the input through each of the networks. Blue lines indicate the error BP path, which has a local behavior in FFA and Polar-FFA. Additionally, the two adapted probability functions and the normalization process are included in the plot.
Figure 2: Distribution of the difference in convergence area $\Delta\textup{CA}$ (vertical axis) and accuracy $\Delta\textup{ACC}$ (horizontal axis) between Polar-FFA with $P_\sigma$ and FFA with $P_\sigma^{\textup{FFA}}$. The vertical axis is in square-root scale for the sake of readability. Only models with an accuracy greater than $70\%$ have been plotted, filtering those achieving fast convergence due to having suboptimal maximum accuracy.
Figure 3: Distribution of the difference in the separability index $\Delta\textup{SI}$ (y-axis) and accuracy $\Delta\textup{ACC}$ (x-axis) between Polar-FFA with $P_\sigma$ and FFA with $P_\sigma^{\textup{FFA}}$. Only models with accuracy higher than $20\%$ are included, thereby filtering out near-random networks.
Figure E1: Distribution in the difference in accuracy between the distinct percentages of positive to negative neurons to the mean accuracy of the different percentages.
Figure E2: Plot showing the accuracy difference of the outlying elements, categorized based on the probability function and the activation function used. Points with less than 0.08 absolute accuracy difference, denoted as stable configuration band, have been removed to improve clarity.
...and 7 more figures

On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization

TL;DR

Abstract

On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization

Authors

TL;DR

Abstract

Table of Contents

Figures (12)