Table of Contents
Fetching ...

In Search of Goodness: Large Scale Benchmarking of Goodness Functions for the Forward-Forward Algorithm

Arya Shah, Vaibhav Tripathi

TL;DR

This paper addresses the critical role of the goodness function in the Forward-Forward algorithm by conducting a large-scale benchmark across four image datasets and 21 objectives, measuring both classification performance and environmental impact. By formalizing a diverse registry of goodness functions and a uniform training setup, it demonstrates that many alternative objectives can outperform the standard sum-of-squares baseline, with notable gains from predictive coding, margin-based losses, and sparse/decorrelated objectives. The study also highlights substantial energy and carbon footprint variation across functions, revealing Pareto-optimal trade-offs where accuracy can be improved without a proportional rise in environmental cost. Overall, the work establishes goodness as a pivotal hyperparameter in FF design, provides reproducible code, and points toward greener biologically plausible learning paradigms and hardware considerations.

Abstract

The Forward-Forward (FF) algorithm offers a biologically plausible alternative to backpropagation, enabling neural networks to learn through local updates. However, FF's efficacy relies heavily on the definition of "goodness", which is a scalar measure of neural activity. While current implementations predominantly utilize a simple sum-of-squares metric, it remains unclear if this default choice is optimal. To address this, we benchmarked 21 distinct goodness functions across four standard image datasets (MNIST, FashionMNIST, CIFAR-10, STL-10), evaluating classification accuracy, energy consumption, and carbon footprint. We found that certain alternative goodness functions inspired from various domains significantly outperform the standard baseline. Specifically, \texttt{game\_theoretic\_local} achieved 97.15\% accuracy on MNIST, \texttt{softmax\_energy\_margin\_local} reached 82.84\% on FashionMNIST, and \texttt{triplet\_margin\_local} attained 37.69\% on STL-10. Furthermore, we observed substantial variability in computational efficiency, highlighting a critical trade-off between predictive performance and environmental cost. These findings demonstrate that the goodness function is a pivotal hyperparameter in FF design. We release our code on \href{https://github.com/aryashah2k/In-Search-of-Goodness}{Github} for reference and reproducibility.

In Search of Goodness: Large Scale Benchmarking of Goodness Functions for the Forward-Forward Algorithm

TL;DR

This paper addresses the critical role of the goodness function in the Forward-Forward algorithm by conducting a large-scale benchmark across four image datasets and 21 objectives, measuring both classification performance and environmental impact. By formalizing a diverse registry of goodness functions and a uniform training setup, it demonstrates that many alternative objectives can outperform the standard sum-of-squares baseline, with notable gains from predictive coding, margin-based losses, and sparse/decorrelated objectives. The study also highlights substantial energy and carbon footprint variation across functions, revealing Pareto-optimal trade-offs where accuracy can be improved without a proportional rise in environmental cost. Overall, the work establishes goodness as a pivotal hyperparameter in FF design, provides reproducible code, and points toward greener biologically plausible learning paradigms and hardware considerations.

Abstract

The Forward-Forward (FF) algorithm offers a biologically plausible alternative to backpropagation, enabling neural networks to learn through local updates. However, FF's efficacy relies heavily on the definition of "goodness", which is a scalar measure of neural activity. While current implementations predominantly utilize a simple sum-of-squares metric, it remains unclear if this default choice is optimal. To address this, we benchmarked 21 distinct goodness functions across four standard image datasets (MNIST, FashionMNIST, CIFAR-10, STL-10), evaluating classification accuracy, energy consumption, and carbon footprint. We found that certain alternative goodness functions inspired from various domains significantly outperform the standard baseline. Specifically, \texttt{game\_theoretic\_local} achieved 97.15\% accuracy on MNIST, \texttt{softmax\_energy\_margin\_local} reached 82.84\% on FashionMNIST, and \texttt{triplet\_margin\_local} attained 37.69\% on STL-10. Furthermore, we observed substantial variability in computational efficiency, highlighting a critical trade-off between predictive performance and environmental cost. These findings demonstrate that the goodness function is a pivotal hyperparameter in FF design. We release our code on \href{https://github.com/aryashah2k/In-Search-of-Goodness}{Github} for reference and reproducibility.

Paper Structure

This paper contains 30 sections, 22 equations, 17 figures, 6 tables.

Figures (17)

  • Figure 1: Overview of the Benchmarking Framework. The system processes four standard image datasets (MNIST, FashionMNIST, CIFAR-10, STL-10) using the Forward-Forward algorithm. A central registry manages 21 distinct goodness functions (e.g., Sum of Squares, Game Theoretic, InfoNCE) which are plugged into the network layers. We evaluate predictive performance (accuracy) and environmental cost (carbon emissions/energy) to identify trade-offs between biological plausibility and sustainability.
  • Figure 2: Comparison of final classification accuracy across different goodness functions on CIFAR-10. The predictive_coding_local function achieves the highest performance, significantly outperforming the standard baseline, while bcm_local and oja_local show lower stability.
  • Figure 3: Comparison of final classification accuracy on FashionMNIST. softmax_energy_margin_local achieves the highest multi-pass accuracy (86.32%), demonstrating the effectiveness of margin-based objectives for this dataset. Note the failure mode of bcm_local and outlier_trimmed_energy_local.
  • Figure 4: Comparison of final classification accuracy on MNIST. The majority of goodness functions achieve high accuracy ($>97\%$), with game_theoretic_local performing best. This indicates the Forward-Forward algorithm is highly effective for digit recognition across diverse objectives.
  • Figure 5: Comparison of final classification accuracy on STL-10. triplet_margin_local achieved the highest Multi-pass Accuracy (37.72%), suggesting that explicit separation between positive and negative samples is crucial for data-scarce, high-resolution tasks.
  • ...and 12 more figures