Contrastive Forward-Forward: A Training Algorithm of Vision Transformer
Hossein Aghagolzadeh, Mehdi Ezoji
TL;DR
This work addresses the training efficiency and biological plausibility gap of backpropagation by extending Forward-Forward to Vision Transformers with Contrastive Forward-Forward (CFF). By replacing FF's local losses with a supervised contrastive objective and adopting a two-branch data flow akin to contrastive learning, the method achieves higher accuracy and much faster convergence, while remaining competitive with backpropagation under various conditions, including inaccurate supervision. A Marginal Contrastive Loss is introduced to progressively tighten same-class representations across layers, and the approach is validated on ViT architectures across multiple datasets, with notable gains in convergence speed and inference efficiency. The results demonstrate the practical potential of brain-inspired, layer-wise, contrastive training for large-scale vision models and highlight opportunities for parallelization, robustness, and applicability beyond simple architectures.
Abstract
Although backpropagation is widely accepted as a training algorithm for artificial neural networks, researchers are always looking for inspiration from the brain to find ways with potentially better performance. Forward-Forward is a novel training algorithm that is more similar to what occurs in the brain, although there is a significant performance gap compared to backpropagation. In the Forward-Forward algorithm, the loss functions are placed after each layer, and the updating of a layer is done using two local forward passes and one local backward pass. Forward-Forward is in its early stages and has been designed and evaluated on simple multi-layer perceptron networks to solve image classification tasks. In this work, we have extended the use of this algorithm to a more complex and modern network, namely the Vision Transformer. Inspired by insights from contrastive learning, we have attempted to revise this algorithm, leading to the introduction of Contrastive Forward-Forward. Experimental results show that our proposed algorithm performs significantly better than the baseline Forward-Forward leading to an increase of up to 10% in accuracy and accelerating the convergence speed by 5 to 20 times. Furthermore, if we take Cross Entropy as the baseline loss function in backpropagation, it will be demonstrated that the proposed modifications to the baseline Forward-Forward reduce its performance gap compared to backpropagation on Vision Transformer, and even outperforms it in certain conditions, such as inaccurate supervision.
