Table of Contents
Fetching ...

Harnessing uncertainty when learning through Equilibrium Propagation in neural networks

Jonathan Peters, Philippe Talatchian

TL;DR

The paper investigates training deep networks with Equilibrium Propagation (EP) on hardware subject to measurement uncertainty. It introduces a stochastic EP framework for nonlinear resistive networks, modeling post-activation noise with $V^{\text{samp}} = V^{\text{att}} + \sigma\,dB_t$, and shows a dataset-independent critical limit near $\sigma_c \approx 5\times 10^{-5}$; crucially, sampling per attractor state raises this limit according to $\sigma^{\text{act}} = \sigma/\sqrt{N}$ via the Central Limit Theorem, enabling reliable learning on noisier hardware. Empirical results on MNIST, KMNIST, and FashionMNIST reveal that optimal noise levels improve convergence and testing accuracy (e.g., KMNIST from ~77% to ~97%, FashionMNIST from ~26% to ~93%), while MNIST remains reliably learnable even without noise. These findings offer a concrete path toward energy-efficient, self-learning hardware that leverages EP under realistic uncertainties.

Abstract

Equilibrium Propagation (EP) is a supervised learning algorithm that trains network parameters using local neuronal activity. This is in stark contrast to backpropagation, where updating the parameters of the network requires significant data shuffling. Avoiding data movement makes EP particularly compelling as a learning framework for energy-efficient training on neuromorphic systems. In this work, we assess the ability of EP to learn on hardware that contain physical uncertainties. This is particularly important for researchers concerned with hardware implementations of self-learning systems that utilize EP. Our results demonstrate that deep, multi-layer neural network architectures can be trained successfully using EP in the presence of finite uncertainties, up to a critical limit. This limit is independent of the training dataset, and can be scaled through sampling the network according to the central limit theorem. Additionally, we demonstrate improved model convergence and performance for finite levels of uncertainty on the MNIST, KMNIST and FashionMNIST datasets. Optimal performance is found for networks trained with uncertainties close to the critical limit. Our research supports future work to build self-learning hardware in situ with EP.

Harnessing uncertainty when learning through Equilibrium Propagation in neural networks

TL;DR

The paper investigates training deep networks with Equilibrium Propagation (EP) on hardware subject to measurement uncertainty. It introduces a stochastic EP framework for nonlinear resistive networks, modeling post-activation noise with , and shows a dataset-independent critical limit near ; crucially, sampling per attractor state raises this limit according to via the Central Limit Theorem, enabling reliable learning on noisier hardware. Empirical results on MNIST, KMNIST, and FashionMNIST reveal that optimal noise levels improve convergence and testing accuracy (e.g., KMNIST from ~77% to ~97%, FashionMNIST from ~26% to ~93%), while MNIST remains reliably learnable even without noise. These findings offer a concrete path toward energy-efficient, self-learning hardware that leverages EP under realistic uncertainties.

Abstract

Equilibrium Propagation (EP) is a supervised learning algorithm that trains network parameters using local neuronal activity. This is in stark contrast to backpropagation, where updating the parameters of the network requires significant data shuffling. Avoiding data movement makes EP particularly compelling as a learning framework for energy-efficient training on neuromorphic systems. In this work, we assess the ability of EP to learn on hardware that contain physical uncertainties. This is particularly important for researchers concerned with hardware implementations of self-learning systems that utilize EP. Our results demonstrate that deep, multi-layer neural network architectures can be trained successfully using EP in the presence of finite uncertainties, up to a critical limit. This limit is independent of the training dataset, and can be scaled through sampling the network according to the central limit theorem. Additionally, we demonstrate improved model convergence and performance for finite levels of uncertainty on the MNIST, KMNIST and FashionMNIST datasets. Optimal performance is found for networks trained with uncertainties close to the critical limit. Our research supports future work to build self-learning hardware in situ with EP.

Paper Structure

This paper contains 8 sections, 12 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Depiction of Equilibrium Propagation (EP) applied to a layered neural network architecture. Both phases allow the network to relax into equilibrium states defined by their energy functions, from \ref{['energies']}, with the input layer $x$ clamped on both occasions. The additional force applied during the nudging phase is applied solely to the output nodes $y$. This is because the nudging is dependent on the loss function of the free state equilibrium, which is only defined in terms of the network outputs. During the nudged phase, error information is then implicitly propagated through the network through node dynamics, which are then used for parameter updates.
  • Figure 2: Example data from a) MNIST, b) KMNIST and c) FashionMNIST datasets. Each dataset contains 10 different classes. MNIST classifies handwritten digits. KMNIST replaces digits with 10 different types of handwritten Japanese characters taken from hiragana. FashionMNIST classifies greyscale images of 10 types of clothing.
  • Figure 3: Average and maximum testing accuracies for different measurement uncertainty variances. For each dataset, 30 different trials were used to find the maximum and average accuracies at each uncertainty level.
  • Figure 4: Main: Average testing accuracy for training on the MNIST dataset, for different number of samples $N$ of each attractor state (as required in \ref{['param_update_expectation']}). Increasing the sampling of the attractor state results in a larger measurement uncertainty variance at which accurate training takes place. For $N=1$ sample, we can verify the critical uncertainty found in Fig. \ref{['dataset_accuracies']} at $\sigma=5\text{x}10^{-5}$. Insert: Maximum uncertainty at which the average training reaches the threshold 90$\%$, showing explicitly the relation in \ref{['central_limit_theorem']} with the number of samples $N$.
  • Figure 5: Heat maps showing the convergence regions for training on the MNIST dataset when varying hyperparameters $\beta$ and $\eta^{\text{eff}}$. The black boxed tile represents the hyperparameter choice used during training for the results presented in Fig. \ref{['dataset_accuracies']} and Fig. \ref{['clt_figure']}. We can verify the previously found critical uncertainty for training convergence in Fig. \ref{['dataset_accuracies']} by observing that the chosen hyperparameter values exist within the convergence area for no and low uncertainties, then disappears as the uncertainty increases. When $\sigma=10^{-3}$, training fails to converge for the range of hyperparameters tested.