Distance-Forward Learning: Enhancing the Forward-Forward Algorithm Towards High-Performance On-Chip Learning
Yujie Wu, Siyuan Xu, Jibin Wu, Lei Deng, Mingkun Xu, Qinghao Wen, Guoqi Li
TL;DR
This work reframes Forward-Forward learning as distance-based metric learning, introducing distance-forward (DF) to enhance on-chip training. DF uses a centroid-like distance representation with a goodness-based margin loss and N-pair negative mining, and it adds two local-update schemes (DF-O and DF-R) to balance accuracy and hardware efficiency. Empirical results across six image datasets show DF–especially the block-wise DF-O variant–achieves state-of-the-art performance among local learning methods, with memory costs below 40% of backpropagation and robustness to hardware-induced noise. The approach offers a practical path toward energy-efficient, online learning on neuromorphic hardware by preserving local computation while improving discriminative power.
Abstract
The Forward-Forward (FF) algorithm was recently proposed as a local learning method to address the limitations of backpropagation (BP), offering biological plausibility along with memory-efficient and highly parallelized computational benefits. However, it suffers from suboptimal performance and poor generalization, largely due to inadequate theoretical support and a lack of effective learning strategies. In this work, we reformulate FF using distance metric learning and propose a distance-forward algorithm (DF) to improve FF performance in supervised vision tasks while preserving its local computational properties, making it competitive for efficient on-chip learning. To achieve this, we reinterpret FF through the lens of centroid-based metric learning and develop a goodness-based N-pair margin loss to facilitate the learning of discriminative features. Furthermore, we integrate layer-collaboration local update strategies to reduce information loss caused by greedy local parameter updates. Our method surpasses existing FF models and other advanced local learning approaches, with accuracies of 99.7\% on MNIST, 88.2\% on CIFAR-10, 59\% on CIFAR-100, 95.9\% on SVHN, and 82.5\% on ImageNette, respectively. Moreover, it achieves comparable performance with less than 40\% memory cost compared to BP training, while exhibiting stronger robustness to multiple types of hardware-related noise, demonstrating its potential for online learning and energy-efficient computation on neuromorphic chips.
