Table of Contents
Fetching ...

Adaptive Spatial Goodness Encoding: Advancing and Scaling Forward-Forward Learning Without Backpropagation

Qingchun Gong, Robert Bogdan Staszewski, Kai Xu

TL;DR

This paper tackles the scalability and representational limitations of backpropagation and existing Forward-Forward (FF) approaches in CNNs. It introduces Adaptive Spatial Goodness Encoding (ASGE), a BP-free framework that computes layer-wise, spatially localized goodness from feature maps, partitions channels using a channel-aware patch scheme, and uses fixed random projections to generate class logits, thereby decoupling classification complexity from channel dimensionality. ASGE employs RMS pooling to preserve energy in the goodness measure and supports flexible prediction strategies (Last Pred, Fusion Pred, Best Pred) to balance accuracy, memory, and compute. Empirically, ASGE achieves state-of-the-art FF-based performance on MNIST, FashionMNIST, CIFAR-10/100 and scales to ImageNet with Top-1 26.21% and Top-5 47.49%, narrowing the gap to BP while maintaining fully layer-wise, BP-free updates. The framework demonstrates robust improvements in layer-wise representations and offers practical deployment options under resource constraints, marking a significant step toward scalable BP-free training for modern CNNs.

Abstract

The Forward-Forward (FF) algorithm offers a promising alternative to backpropagation (BP). Despite advancements in recent FF-based extensions, which have enhanced the original algorithm and adapted it to convolutional neural networks (CNNs), they often suffer from limited representational capacity and poor scalability to large-scale datasets, primarily due to exploding channel dimensionality. In this work, we propose adaptive spatial goodness encoding (ASGE), a new FF-based training framework tailored for CNNs. ASGE leverages feature maps to compute spatially-aware goodness representations at each layer, enabling layer-wise supervision. Crucially, this approach decouples classification complexity from channel dimensionality, thereby addressing the issue of channel explosion and achieving competitive performance compared to other BP alternatives. ASGE outperforms all other FF-based approaches across multiple benchmarks, delivering test accuracies of 99.65% on MNIST, 93.41% on FashionMNIST, 90.62% on CIFAR-10, and 65.42% on CIFAR-100. Moreover, we present the first successful application of FF-based training to ImageNet, with Top-1 and Top-5 accuracies of 51.58% and 75.23%. Furthermore, we propose three prediction strategies to achieve flexible trade-offs among accuracy, parameters and memory usage, enabling deployment under diverse resource constraints.

Adaptive Spatial Goodness Encoding: Advancing and Scaling Forward-Forward Learning Without Backpropagation

TL;DR

This paper tackles the scalability and representational limitations of backpropagation and existing Forward-Forward (FF) approaches in CNNs. It introduces Adaptive Spatial Goodness Encoding (ASGE), a BP-free framework that computes layer-wise, spatially localized goodness from feature maps, partitions channels using a channel-aware patch scheme, and uses fixed random projections to generate class logits, thereby decoupling classification complexity from channel dimensionality. ASGE employs RMS pooling to preserve energy in the goodness measure and supports flexible prediction strategies (Last Pred, Fusion Pred, Best Pred) to balance accuracy, memory, and compute. Empirically, ASGE achieves state-of-the-art FF-based performance on MNIST, FashionMNIST, CIFAR-10/100 and scales to ImageNet with Top-1 26.21% and Top-5 47.49%, narrowing the gap to BP while maintaining fully layer-wise, BP-free updates. The framework demonstrates robust improvements in layer-wise representations and offers practical deployment options under resource constraints, marking a significant step toward scalable BP-free training for modern CNNs.

Abstract

The Forward-Forward (FF) algorithm offers a promising alternative to backpropagation (BP). Despite advancements in recent FF-based extensions, which have enhanced the original algorithm and adapted it to convolutional neural networks (CNNs), they often suffer from limited representational capacity and poor scalability to large-scale datasets, primarily due to exploding channel dimensionality. In this work, we propose adaptive spatial goodness encoding (ASGE), a new FF-based training framework tailored for CNNs. ASGE leverages feature maps to compute spatially-aware goodness representations at each layer, enabling layer-wise supervision. Crucially, this approach decouples classification complexity from channel dimensionality, thereby addressing the issue of channel explosion and achieving competitive performance compared to other BP alternatives. ASGE outperforms all other FF-based approaches across multiple benchmarks, delivering test accuracies of 99.65% on MNIST, 93.41% on FashionMNIST, 90.62% on CIFAR-10, and 65.42% on CIFAR-100. Moreover, we present the first successful application of FF-based training to ImageNet, with Top-1 and Top-5 accuracies of 51.58% and 75.23%. Furthermore, we propose three prediction strategies to achieve flexible trade-offs among accuracy, parameters and memory usage, enabling deployment under diverse resource constraints.

Paper Structure

This paper contains 29 sections, 8 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Layer-wise training accuracy (layers 2-7) of VGG8 trained with FF on CIFAR-10. First and final layers excluded. Left: positive samples. Right: negative samples.
  • Figure 2: (a) Top: ASGE architecture with prediction strategies (Fusion Pred: Cumulative Layer Fusion Prediction, Best Pred: Best Layer Selection Prediction, Last Pred: Last Layer Only Prediction); Bottom: Layer-wise adaptive spatial goodness. (b) CwC goodness.
  • Figure 3: Layer-wise prediction accuracy (layers 2-7) of VGG8 trained with ASGE on CIFAR-10. First and final layers excluded. Left: training. Right: validation.
  • Figure 4: Spatial goodness distributions on a logarithmic scale before (G-Origin) and after different pooling (G-RMS, G-Avg, G-Max) in a layer that has pooling. Each row represents a different channel configuration: C = 512 (top) and C = 256 (bottom).