Table of Contents
Fetching ...

Self-Contrastive Forward-Forward Algorithm

Xing Chen, Dongshu Liu, Jeremie Laydevant, Julie Grollier

TL;DR

SCFF introduces Self-Contrastive Forward-Forward, a forward-only, local-learning algorithm that generates self-derived positive and negative inputs to optimize layer-wise goodness without backpropagation. It extends FF to unsupervised and sequential tasks, achieving state-of-the-art results among purely forward/local methods on MNIST, CIFAR-10, STL-10, and Tiny ImageNet, and demonstrates effective time-series learning with Bi-RNNs on FSDD. The method combines greedy or joint training, a sigmoid-based loss on layer goodness, and principled negative-sample design that positions negatives between positive clusters, enabling robust representation learning suitable for hardware-constrained environments. These results highlight SCFF’s potential for high-accuracy, real-time edge learning, and its relevance for neuromorphic computing where forward-only, local updates are advantageous.

Abstract

Agents that operate autonomously benefit from lifelong learning capabilities. However, compatible training algorithms must comply with the decentralized nature of these systems, which imposes constraints on both the parameter counts and the computational resources. The Forward-Forward (FF) algorithm is one of these. FF relies only on feedforward operations, the same used for inference, for optimizing layer-wise objectives. This purely forward approach eliminates the need for transpose operations required in traditional backpropagation. Despite its potential, FF has failed to reach state-of-the-art performance on most standard benchmark tasks, in part due to unreliable negative data generation methods for unsupervised learning. In this work, we propose the Self-Contrastive Forward-Forward (SCFF) algorithm, a competitive training method aimed at closing this performance gap. Inspired by standard self-supervised contrastive learning for vision tasks, SCFF generates positive and negative inputs applicable across various datasets. The method demonstrates superior performance compared to existing unsupervised local learning algorithms on several benchmark datasets, including MNIST, CIFAR-10, STL-10, and Tiny ImageNet. We extend FF's application to training recurrent neural networks, expanding its utility to sequential data tasks. These findings pave the way for high-accuracy, real-time learning on resource-constrained edge devices.

Self-Contrastive Forward-Forward Algorithm

TL;DR

SCFF introduces Self-Contrastive Forward-Forward, a forward-only, local-learning algorithm that generates self-derived positive and negative inputs to optimize layer-wise goodness without backpropagation. It extends FF to unsupervised and sequential tasks, achieving state-of-the-art results among purely forward/local methods on MNIST, CIFAR-10, STL-10, and Tiny ImageNet, and demonstrates effective time-series learning with Bi-RNNs on FSDD. The method combines greedy or joint training, a sigmoid-based loss on layer goodness, and principled negative-sample design that positions negatives between positive clusters, enabling robust representation learning suitable for hardware-constrained environments. These results highlight SCFF’s potential for high-accuracy, real-time edge learning, and its relevance for neuromorphic computing where forward-only, local updates are advantageous.

Abstract

Agents that operate autonomously benefit from lifelong learning capabilities. However, compatible training algorithms must comply with the decentralized nature of these systems, which imposes constraints on both the parameter counts and the computational resources. The Forward-Forward (FF) algorithm is one of these. FF relies only on feedforward operations, the same used for inference, for optimizing layer-wise objectives. This purely forward approach eliminates the need for transpose operations required in traditional backpropagation. Despite its potential, FF has failed to reach state-of-the-art performance on most standard benchmark tasks, in part due to unreliable negative data generation methods for unsupervised learning. In this work, we propose the Self-Contrastive Forward-Forward (SCFF) algorithm, a competitive training method aimed at closing this performance gap. Inspired by standard self-supervised contrastive learning for vision tasks, SCFF generates positive and negative inputs applicable across various datasets. The method demonstrates superior performance compared to existing unsupervised local learning algorithms on several benchmark datasets, including MNIST, CIFAR-10, STL-10, and Tiny ImageNet. We extend FF's application to training recurrent neural networks, expanding its utility to sequential data tasks. These findings pave the way for high-accuracy, real-time learning on resource-constrained edge devices.
Paper Structure (18 sections, 6 equations, 5 figures, 5 tables)

This paper contains 18 sections, 6 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparative diagram illustrating three distinct unsupervised (self-supervised) learning paradigms.a. Generation of a negative example is implemented by hybridization of two different images in the original FF paper hinton2022forward. b. In Forward Forward (FF) Learning, the layer-wise loss function is defined so as to maximize the goodness for positive inputs (real images) and minimize the goodness for negative inputs, each of which is generated by corrupting the real image to form a fake image, as shown in a. c. In Contrastive Learning, the InfoNCE loss function determines the similarity between representations of two inputs (two different inputs or two same inputs but with different augmentations) in the end of the network chen2020simple. d. Our proposed Contrastive Forward Forward Learning algorithm combines the principles of Forward Forward Learning and Contrastive Learning algorithms to maximize the goodness for concatenated similar pairs and minimize the goodness for dissimilar pairs with a layer-wise loss function.
  • Figure 2: SCFF method for processing with Convolutional Neural Network Architecture. a. The original batch of images (top row) is processed to generate positive (middle row) and negative examples (bottom row). b. The generated positive and negative examples undergo a series of convolutional (Conv.) and pooling (AvgPool or Maxpool) operations to extract relevant features. The output neurons which are extracted from each hidden layer after an external average pooling layer are then fed together into a softmax layer for final classification.
  • Figure 3: Comparison of test accuracy at different layers by using SCFF and Back-propagation methods on CIFAR-10 in a and on STL-10 dataset in b.
  • Figure 4: Bi-directional RNN results on FSDD dataset. a. Training procedure of SCFF on a Bi-RNN. In the first stage, unsupervised training is performed on the hidden connections (both input-to-hidden and hidden-to-hidden transformations) using positive and negative examples. Positive examples are created by concatenating two identical MFCC feature vectors of a digit along the feature dimension, while negative examples are generated by concatenating MFCCs from two different digits, as illustrated in the figure. At each time step, the features are sequentially fed into the Bi-RNN (RNN and RNN$^*$). The red regions indicate features at different time steps. In the second stage, a linear classifier is trained using the final hidden states from both RNNs, i.e., $H_T$ and $H_0^*$ as inputs for classification task. b. Comparison of test accuracy for the linear classifier trained on Bi-RNN outputs. The yellow curve represents accuracy with untrained (random) hidden neuron connections, the blue curve shows results from training with SCFF, the green curve shows Backprop results.
  • Figure 5: Probability distributions of relative positions between positive and negative examples. a Theoretical distributions of positive examples from two different classes with distinct means ($2\mu_1 = 0$ and $2\mu_2 = 15$) and identical variance ($2\Sigma = 4$) are shown with blue and orange curves, respectively. The theoretical distribution of negative examples derived from the two classes using the formula \ref{['eq:dis']} is depicted by the grey curve. b Continuous probability density of LDA applied to the IRIS dataset, displaying contours for positive examples in green, red, and blue, and for negative examples in grey.