Table of Contents
Fetching ...

Convergence of energy-based learning in linear resistive networks

Anne-Men Huijzer, Thomas Chaffey, Bart Besselink, Henk J. van Waarde

TL;DR

This work analyzes energy-based learning in a network of linear resistors by studying Contrastive Learning (CL) for adjusting conductances to match target output potentials. It proves that CL updates are equivalent to projected gradient descent on a convex potential $H$ (with gradient $h$) for any step size, ensuring convergence via averaged-operator theory. The authors derive explicit expressions for the network state, show $h$ is Lipschitz and gradient of a convex function, and establish convergence of the deterministic CL algorithm; they further extend the analysis to a stochastic setting over multiple input-output pairs, proving almost-sure convergence under standard step-size conditions. This work provides a rigorous convergence framework for distributed, hardware-friendly energy-based learning in resistive networks, bridging learning theory, circuit dynamics, and distributed convex optimization.

Abstract

Energy-based learning algorithms are alternatives to backpropagation and are well-suited to distributed implementations in analog electronic devices. However, a rigorous theory of convergence is lacking. We make a first step in this direction by analysing a particular energy-based learning algorithm, Contrastive Learning, applied to a network of linear adjustable resistors. It is shown that, in this setup, Contrastive Learning is equivalent to projected gradient descent on a convex function, for any step size, giving a guarantee of convergence for the algorithm.

Convergence of energy-based learning in linear resistive networks

TL;DR

This work analyzes energy-based learning in a network of linear resistors by studying Contrastive Learning (CL) for adjusting conductances to match target output potentials. It proves that CL updates are equivalent to projected gradient descent on a convex potential (with gradient ) for any step size, ensuring convergence via averaged-operator theory. The authors derive explicit expressions for the network state, show is Lipschitz and gradient of a convex function, and establish convergence of the deterministic CL algorithm; they further extend the analysis to a stochastic setting over multiple input-output pairs, proving almost-sure convergence under standard step-size conditions. This work provides a rigorous convergence framework for distributed, hardware-friendly energy-based learning in resistive networks, bridging learning theory, circuit dynamics, and distributed convex optimization.

Abstract

Energy-based learning algorithms are alternatives to backpropagation and are well-suited to distributed implementations in analog electronic devices. However, a rigorous theory of convergence is lacking. We make a first step in this direction by analysing a particular energy-based learning algorithm, Contrastive Learning, applied to a network of linear adjustable resistors. It is shown that, in this setup, Contrastive Learning is equivalent to projected gradient descent on a convex function, for any step size, giving a guarantee of convergence for the algorithm.

Paper Structure

This paper contains 11 sections, 9 theorems, 96 equations, 3 figures, 2 algorithms.

Key Result

Lemma 1

(Ryu2022) Let $f_1:\mathcal{C}\rightarrow \mathcal{C}$ and $f_2:\mathcal{C}\rightarrow \mathcal{C}$ be $\theta_1$- and $\theta_2$-averaged functions with $\theta_1, \theta_2 \in (0,1)$. Then the composition of $f_1$ and $f_2$, i.e., $f_1 \circ f_2$, is $\theta$-averaged with

Figures (3)

  • Figure 1: Resistive electrical network with three input and two output nodes connected to a source applying a current $j_I$.
  • Figure 2: Illustrations of a network of linear resistors with adjustable conductances at a time-step $t$. In both states, the vector of voltage potentials equals $p_I$, whereas in (a) the output potentials are free and in $(b)$ the output potentials are clamped to a desired value $p_O^D$ dictated by the training data.
  • Figure 3: The proposed algorithm at time-step $t$ expressed as a feedback system having as input the difference between the element-wise squared vector of desired voltages $v^D$ and the current voltages $v(g^{t})$ and as output the vector of conductances $g^{t+1}$.

Theorems & Definitions (17)

  • Remark 1
  • Lemma 1
  • Theorem 2
  • proof
  • Theorem 3
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • Lemma 6
  • ...and 7 more