Stochastic Forward-Forward Learning through Representational Dimensionality Compression
Zhichao Zhu, Yang Qi, Hengyuan Ma, Wenlian Lu, Jianfeng Feng
TL;DR
The paper addresses learning neural networks without backpropagation or curated negative samples by extending Forward-Forward learning with a dimensionality-based objective. It introduces effective dimensionality (ED) as a second-order statistic and optimizes a two-term loss that minimizes within-class ED while maximizing across-sample ED, using noise-augmented copies to avoid explicit negatives and adopting energy-based inference via the mean squared outputs. Empirical results on MNIST, CIFAR-10, and CIFAR-100 show competitive performance with other non-BP methods, with noise and dimensionality compression playing crucial roles. The approach offers a biologically plausible, hardware-friendly alternative and situates itself within connections to self-supervised and predictive-coding paradigms, though scaling to larger models remains an open challenge.
Abstract
The Forward-Forward (FF) learning algorithm provides a bottom-up alternative to backpropagation (BP) for training neural networks, relying on a layer-wise "goodness" function with well-designed negative samples for contrastive learning. Existing goodness functions are typically defined as the sum of squared postsynaptic activations, neglecting correlated variability between neurons. In this work, we propose a novel goodness function termed dimensionality compression that uses the effective dimensionality (ED) of fluctuating neural responses to incorporate second-order statistical structure. Our objective minimizes ED for noisy copies of individual inputs while maximizing it across the sample distribution, promoting structured representations without the need to prepare negative samples.We demonstrate that this formulation achieves competitive performance compared to other non-BP methods. Moreover, we show that noise plays a constructive role that can enhance generalization and improve inference when predictions are derived from the mean of squared output, which is equivalent to making predictions based on an energy term. Our findings contribute to the development of more biologically plausible learning algorithms and suggest a natural fit for neuromorphic computing, where stochasticity is a computational resource rather than a nuisance. The code is available at https://github.com/ZhichaoZhu/StochasticForwardForward
