Table of Contents
Fetching ...

Local learning for stable backpropagation-free neural network training towards physical learning

Yaqi Guo, Fabian Braun, Bastiaan Ketelaar, Stephanie Tan, Richard Norte, Siddhant Kumar

Abstract

While backpropagation and automatic differentiation have driven deep learning's success, the physical limits of chip manufacturing and rising environmental costs of deep learning motivate alternative learning paradigms such as physical neural networks. However, most existing physical neural networks still rely on digital computing for training, largely because backpropagation and automatic differentiation are difficult to realize in physical systems. We introduce FFzero, a forward-only learning framework enabling stable neural network training without backpropagation or automatic differentiation. FFzero combines layer-wise local learning, prototype-based representations, and directional-derivative-based optimization through forward evaluations only. We show that local learning is effective under forward-only optimization, where backpropagation fails. FFzero generalizes to multilayer perceptron and convolutional neural networks across classification and regression. Using a simulated photonic neural network as an example, we demonstrate that FFzero provides a viable path toward backpropagation-free in-situ physical learning.

Local learning for stable backpropagation-free neural network training towards physical learning

Abstract

While backpropagation and automatic differentiation have driven deep learning's success, the physical limits of chip manufacturing and rising environmental costs of deep learning motivate alternative learning paradigms such as physical neural networks. However, most existing physical neural networks still rely on digital computing for training, largely because backpropagation and automatic differentiation are difficult to realize in physical systems. We introduce FFzero, a forward-only learning framework enabling stable neural network training without backpropagation or automatic differentiation. FFzero combines layer-wise local learning, prototype-based representations, and directional-derivative-based optimization through forward evaluations only. We show that local learning is effective under forward-only optimization, where backpropagation fails. FFzero generalizes to multilayer perceptron and convolutional neural networks across classification and regression. Using a simulated photonic neural network as an example, we demonstrate that FFzero provides a viable path toward backpropagation-free in-situ physical learning.

Paper Structure

This paper contains 22 sections, 16 equations, 7 figures.

Figures (7)

  • Figure 1: Schematic comparison of FFzero and backpropagation performance when trained with directional derivatives. FFzero employs a local learning strategy in which each layer is updated independently, preventing gradient-estimation errors from compounding across layers. In contrast, backpropagation propagates directional-derivative estimation errors globally across layers, causing performance to degrade with increasing model size. FFzero maintains stable performance at scale while remaining compatible with in-situ physical implementation.
  • Figure 2: Schematic of FFzero algorithm, illustrated for a (a) 3-class classification problem and (b) regression problem. (a) Each linear layer is trained locally and independently by maximizing layer-wise goodness. The cosine similarities of the layer output w.r.t. the fixed prototype vectors (denoted by $\xi[\cdot]$) are used to compute the goodness. Directional derivatives are used to perform gradient-based optimization for each layer locally. (b) In regression tasks, the antipodal prototypes represent the upper ($y=1$) and lower ($y=-1$) prediction bounds, with intermediate ground-truth values encoded as interpolations between them. Each layer’s output is normalized onto a unit sphere and compared against the fixed antipodal vectors via cosine similarity, yielding layer-wise goodness for weights optimization via directional derivatives.
  • Figure 3: Schematic of directional derivative optimization in FFzero. (a) Illustration of directional derivative (DD) estimation on a representative goodness landscape. (b) Gradient estimation by averaging directional derivatives over multiple random directions. (c) Comparison of optimization trajectories obtained using DD-based updates and standard gradient ascent. While gradient ascent follows the true gradient direction, the expectation of the DD-based update matches the true gradient update (Supplementary Information 2). (d) Schematic illustration of the goodness-based DD update for a linear layer.
  • Figure 4: Classification accuracy of MLP models across varying hidden-layer widths, with all hidden layers constrained to equal size. Results are shown for (a) MNIST and (b) FashionMNIST datasets using backpropagation (BP) and forward–forward (FF) training paradigms, each combined with either automatic differentiation (AD) or directional-derivative (DD) optimization. The numbers above each plot indicate the total number of parameters for the corresponding model.
  • Figure 5: Classification accuracy of CNN models across varying number of channels per convolution layer. Results are shown for MNIST and FashionMNIST datasets using backpropagation (BP) and forward–forward (FF) training paradigms, each combined with either automatic differentiation (AD) or directional-derivative (DD) optimization. The numbers above each plot indicate the total number of parameters for the corresponding model.
  • ...and 2 more figures