Table of Contents
Fetching ...

Distance-Forward Learning: Enhancing the Forward-Forward Algorithm Towards High-Performance On-Chip Learning

Yujie Wu, Siyuan Xu, Jibin Wu, Lei Deng, Mingkun Xu, Qinghao Wen, Guoqi Li

TL;DR

This work reframes Forward-Forward learning as distance-based metric learning, introducing distance-forward (DF) to enhance on-chip training. DF uses a centroid-like distance representation with a goodness-based margin loss and N-pair negative mining, and it adds two local-update schemes (DF-O and DF-R) to balance accuracy and hardware efficiency. Empirical results across six image datasets show DF–especially the block-wise DF-O variant–achieves state-of-the-art performance among local learning methods, with memory costs below 40% of backpropagation and robustness to hardware-induced noise. The approach offers a practical path toward energy-efficient, online learning on neuromorphic hardware by preserving local computation while improving discriminative power.

Abstract

The Forward-Forward (FF) algorithm was recently proposed as a local learning method to address the limitations of backpropagation (BP), offering biological plausibility along with memory-efficient and highly parallelized computational benefits. However, it suffers from suboptimal performance and poor generalization, largely due to inadequate theoretical support and a lack of effective learning strategies. In this work, we reformulate FF using distance metric learning and propose a distance-forward algorithm (DF) to improve FF performance in supervised vision tasks while preserving its local computational properties, making it competitive for efficient on-chip learning. To achieve this, we reinterpret FF through the lens of centroid-based metric learning and develop a goodness-based N-pair margin loss to facilitate the learning of discriminative features. Furthermore, we integrate layer-collaboration local update strategies to reduce information loss caused by greedy local parameter updates. Our method surpasses existing FF models and other advanced local learning approaches, with accuracies of 99.7\% on MNIST, 88.2\% on CIFAR-10, 59\% on CIFAR-100, 95.9\% on SVHN, and 82.5\% on ImageNette, respectively. Moreover, it achieves comparable performance with less than 40\% memory cost compared to BP training, while exhibiting stronger robustness to multiple types of hardware-related noise, demonstrating its potential for online learning and energy-efficient computation on neuromorphic chips.

Distance-Forward Learning: Enhancing the Forward-Forward Algorithm Towards High-Performance On-Chip Learning

TL;DR

This work reframes Forward-Forward learning as distance-based metric learning, introducing distance-forward (DF) to enhance on-chip training. DF uses a centroid-like distance representation with a goodness-based margin loss and N-pair negative mining, and it adds two local-update schemes (DF-O and DF-R) to balance accuracy and hardware efficiency. Empirical results across six image datasets show DF–especially the block-wise DF-O variant–achieves state-of-the-art performance among local learning methods, with memory costs below 40% of backpropagation and robustness to hardware-induced noise. The approach offers a practical path toward energy-efficient, online learning on neuromorphic hardware by preserving local computation while improving discriminative power.

Abstract

The Forward-Forward (FF) algorithm was recently proposed as a local learning method to address the limitations of backpropagation (BP), offering biological plausibility along with memory-efficient and highly parallelized computational benefits. However, it suffers from suboptimal performance and poor generalization, largely due to inadequate theoretical support and a lack of effective learning strategies. In this work, we reformulate FF using distance metric learning and propose a distance-forward algorithm (DF) to improve FF performance in supervised vision tasks while preserving its local computational properties, making it competitive for efficient on-chip learning. To achieve this, we reinterpret FF through the lens of centroid-based metric learning and develop a goodness-based N-pair margin loss to facilitate the learning of discriminative features. Furthermore, we integrate layer-collaboration local update strategies to reduce information loss caused by greedy local parameter updates. Our method surpasses existing FF models and other advanced local learning approaches, with accuracies of 99.7\% on MNIST, 88.2\% on CIFAR-10, 59\% on CIFAR-100, 95.9\% on SVHN, and 82.5\% on ImageNette, respectively. Moreover, it achieves comparable performance with less than 40\% memory cost compared to BP training, while exhibiting stronger robustness to multiple types of hardware-related noise, demonstrating its potential for online learning and energy-efficient computation on neuromorphic chips.
Paper Structure (14 sections, 4 equations, 4 figures, 2 tables)

This paper contains 14 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Illustration of different forward-forward-based methods in a distance metric space. (A) The goodness function, $g^{pos/neg}$, as proposed by hinton2022forward, can be formalized as measuring an $L_2$ distance between $\bm{Wx}$ and $\bm{-W}_y\bm{y}^{pos/neg}$. (B) FF essentially operates on two absolute distances, $g^{pos}$ and $g^{neg}$. $\theta$ in Eq. \ref{['Eq: goodness function']} is set to zero and thus simplified here for illustration. Here, $\sigma(x)$ represents a negative log-sigmoid function. Optimizing the goodness function can thereby be interpreted as adjusting the distance between the projections of input patterns and anchor patterns (i.e., label vectors). (C) The SymBa loss is proposed to balance the positive and negative losses. It can be interpreted as manipulating the relative discrepancy between $g^{pos}$ and $g^{neg}$ in the goodness-constructed metric space. (D) The proposed DF combines both relative and absolute distances and mines the distances among several positive and negative samples to facilitate the learning of discriminative features. It uses support data points (see solid boxes)—the most representative data points whose distance exceeds a given margin $m$—to calculate a margin loss. This ensures that positive samples are closer to the anchor pattern than negative samples by $m$. Additionally, it includes regularization to further reduce the absolute distance represented by the maximum goodness function of negative samples (see solid green boxes), effectively ensuring all negative samples are kept far away from any anchors. $\lambda$: the weighted coefficient.
  • Figure 2: Scheme of different gradient update strategies. Unlike end-to-end approaches, DF-O calculates the loss for every layer and blocks information backpropagation across multiple layers (see red dashed line). DF-R further integrates random feedback to replace the indirect backward circuit, achieving higher computational parallelism and better bio-plausibility.
  • Figure 3: Testing classification accuracy and average separation of goodness on Fashion MNIST using the DF-O. The separation is measured as the difference in goodness between positive and negative training samples.
  • Figure 4: DF inherits the memory-efficient (A) and parallelized computational benefits (B) of FF and exhibits strong robustness to three types of hardware-related noise (C-E) compared with BP methods. L1-L5: specific noise levels.