Table of Contents
Fetching ...

Equilibrium Propagation Without Limits

Elon Litman

TL;DR

The paper removes the infinitesimal-nudge limitation in Equilibrium Propagation by modeling network states with Gibbs–Boltzmann distributions and defining a stochastic contrastive objective $J(\theta)$. It derives two exact gradient representations: the difference of expected local energy derivatives and a path-integral covariance form, establishing that finite nudging yields exact gradient descent on $J(\theta)$. A Gibbs variational and information-theoretic interpretation shows $J$ is a tight proxy for the supervised loss with KL regularization, connecting to variational inference and the information bottleneck. Empirical results on Fashion–MNIST demonstrate that finite-nudge EP can match backpropagation performance, overcoming the shortcomings of infinitesimal nudges and offering a local, biologically plausible learning mechanism grounded in statistical physics.

Abstract

We liberate Equilibrium Propagation (EP) from the limit of infinitesimal perturbations by establishing a finite-nudge foundation for local credit assignment. By modeling network states as Gibbs-Boltzmann distributions rather than deterministic points, we prove that the gradient of the difference in Helmholtz free energy between a nudged and free phase is exactly the difference in expected local energy derivatives. This validates the classic Contrastive Hebbian Learning update as an exact gradient estimator for arbitrary finite nudging, requiring neither infinitesimal approximations nor convexity. Furthermore, we derive a generalized EP algorithm based on the path integral of loss-energy covariances, enabling learning with strong error signals that standard infinitesimal approximations cannot support.

Equilibrium Propagation Without Limits

TL;DR

The paper removes the infinitesimal-nudge limitation in Equilibrium Propagation by modeling network states with Gibbs–Boltzmann distributions and defining a stochastic contrastive objective . It derives two exact gradient representations: the difference of expected local energy derivatives and a path-integral covariance form, establishing that finite nudging yields exact gradient descent on . A Gibbs variational and information-theoretic interpretation shows is a tight proxy for the supervised loss with KL regularization, connecting to variational inference and the information bottleneck. Empirical results on Fashion–MNIST demonstrate that finite-nudge EP can match backpropagation performance, overcoming the shortcomings of infinitesimal nudges and offering a local, biologically plausible learning mechanism grounded in statistical physics.

Abstract

We liberate Equilibrium Propagation (EP) from the limit of infinitesimal perturbations by establishing a finite-nudge foundation for local credit assignment. By modeling network states as Gibbs-Boltzmann distributions rather than deterministic points, we prove that the gradient of the difference in Helmholtz free energy between a nudged and free phase is exactly the difference in expected local energy derivatives. This validates the classic Contrastive Hebbian Learning update as an exact gradient estimator for arbitrary finite nudging, requiring neither infinitesimal approximations nor convexity. Furthermore, we derive a generalized EP algorithm based on the path integral of loss-energy covariances, enabling learning with strong error signals that standard infinitesimal approximations cannot support.

Paper Structure

This paper contains 13 sections, 7 theorems, 38 equations, 1 figure.

Key Result

theorem 3.1

Under Assumption as:regularity, the gradient of the stochastic contrastive objective $J(\theta)$ is given exactly by the difference between the expected partial derivative of the energy under the nudged ($\beta=1$) and free ($\beta=0$) Gibbs distributions:

Figures (1)

  • Figure 1: Thermodynamic validation on Fashion--MNIST. Experiments utilize a single hidden-layer energy-based network with $\texttt{tanh}$ units. (A) Gradient Alignment: Cosine similarity between the practical contrastive update $\hat{g}(\beta)$ and two references: the supervised backprop gradient $\nabla \mathcal{L}_{\mathrm{sup}}$ and the true free-energy gradient $\nabla J_\beta$. Alignment improves monotonically with $\beta$, confirming that large nudges remain gradient-like. (B) Signal-to-Noise Ratio: SNR of the state perturbation $\Delta s = s_\beta - s_0$. Finite nudging ($\beta \to 1$) yields high SNR, whereas infinitesimal nudges ($\beta \lesssim 10^{-2}$) are dominated by sampling noise. (C) Test Accuracy: Finite-nudge ($\beta=1.0$) and path-integral EP achieve $\sim 80\%$ accuracy, closely tracking standard backprop. Classical infinitesimal EP ($\beta=0.01$) fails to learn.

Theorems & Definitions (20)

  • definition 2.1: Energy, Loss, and Objective Kernel
  • definition 2.2: Gibbs-Boltzmann Distribution
  • definition 2.3: Helmholtz Free Energy
  • definition 2.4: Stochastic Contrastive Objective
  • theorem 3.1: Gradient as Expectation Contrast
  • proof
  • remark 3.2: Connection to CHL and EP
  • lemma 3.3: Derivative of Free Energy w.r.t. Nudging
  • proof
  • theorem 3.4: Gradient as Integrated Covariance
  • ...and 10 more