Table of Contents
Fetching ...

A Fast Algorithm to Simulate Nonlinear Resistive Networks

Benjamin Scellier

TL;DR

This work reframes nonlinear resistive networks as a convex quadratic program over a linear-feasible set, enabling exact, fast steady-state computation via coordinate descent. It then specializes the solver to Deep Resistive Networks (DRNs), exploiting bipartite layer structure to perform exact block coordinate descent and achieve dramatic speedups over SPICE while scaling networks to MNIST-sized tasks. Empirical results show DRNs trained with equilibrium propagation reach 1.33% test error on MNIST with networks up to 327× larger and 160× faster per epoch, enabling efficient large-scale analog learning simulations. The approach promises scalable, energy-efficient hardware-inspired ML research, with clear paths to handle nonideal device behavior and to integrate with alternative learning paradigms.

Abstract

Analog electrical networks have long been investigated as energy-efficient computing platforms for machine learning, leveraging analog physics during inference. More recently, resistor networks have sparked particular interest due to their ability to learn using local rules (such as equilibrium propagation), enabling potentially important energy efficiency gains for training as well. Despite their potential advantage, the simulations of these resistor networks has been a significant bottleneck to assess their scalability, with current methods either being limited to linear networks or relying on realistic, yet slow circuit simulators like SPICE. Assuming ideal circuit elements, we introduce a novel approach for the simulation of nonlinear resistive networks, which we frame as a quadratic programming problem with linear inequality constraints, and which we solve using a fast, exact coordinate descent algorithm. Our simulation methodology significantly outperforms existing SPICE-based simulations, enabling the training of networks up to 327 times larger at speeds 160 times faster, resulting in a 50,000-fold improvement in the ratio of network size to epoch duration. Our approach can foster more rapid progress in the simulations of nonlinear analog electrical networks.

A Fast Algorithm to Simulate Nonlinear Resistive Networks

TL;DR

This work reframes nonlinear resistive networks as a convex quadratic program over a linear-feasible set, enabling exact, fast steady-state computation via coordinate descent. It then specializes the solver to Deep Resistive Networks (DRNs), exploiting bipartite layer structure to perform exact block coordinate descent and achieve dramatic speedups over SPICE while scaling networks to MNIST-sized tasks. Empirical results show DRNs trained with equilibrium propagation reach 1.33% test error on MNIST with networks up to 327× larger and 160× faster per epoch, enabling efficient large-scale analog learning simulations. The approach promises scalable, energy-efficient hardware-inspired ML research, with clear paths to handle nonideal device behavior and to integrate with alternative learning paradigms.

Abstract

Analog electrical networks have long been investigated as energy-efficient computing platforms for machine learning, leveraging analog physics during inference. More recently, resistor networks have sparked particular interest due to their ability to learn using local rules (such as equilibrium propagation), enabling potentially important energy efficiency gains for training as well. Despite their potential advantage, the simulations of these resistor networks has been a significant bottleneck to assess their scalability, with current methods either being limited to linear networks or relying on realistic, yet slow circuit simulators like SPICE. Assuming ideal circuit elements, we introduce a novel approach for the simulation of nonlinear resistive networks, which we frame as a quadratic programming problem with linear inequality constraints, and which we solve using a fast, exact coordinate descent algorithm. Our simulation methodology significantly outperforms existing SPICE-based simulations, enabling the training of networks up to 327 times larger at speeds 160 times faster, resulting in a 50,000-fold improvement in the ratio of network size to epoch duration. Our approach can foster more rapid progress in the simulations of nonlinear analog electrical networks.
Paper Structure (36 sections, 5 theorems, 76 equations, 5 figures, 3 tables)

This paper contains 36 sections, 5 theorems, 76 equations, 5 figures, 3 tables.

Key Result

Theorem 1

Consider a nonlinear resistive network with $N$ nodes, and denote $v = (v_1,v_2,\ldots,v_N)$ the vector of node electrical potentialsThe electrical potentials are defined up to a constant, so we may assume, for instance, $v_1=0$.. Under the assumption of ideality, the steady state configuration of n where $E: \mathbb{R}^N \to \mathbb{R}$ is defined by and $\mathcal{S}$ is defined as:

Figures (5)

  • Figure 1: Ideal circuit elements and their current-voltage (i-v) characteristics. A linear resistor follows Ohm's law: $i = g v$, where $g$ is the conductance ($g=1/r$, with $r$ being the resistance). An ideal diode is characterized by $i=0$ for $v \leq 0$ ("off-state") and $v=0$ for $i>0$ ("on-state"). An ideal voltage source is characterized by $v=v_0$ for a constant voltage $v_0$ independent of the current $i$. An ideal current source is characterized by $i = i_0$ for a constant current $i_0$ independent of the voltage $v$.
  • Figure 2: A nonlinear resistive network. By assumption, the voltage sources form a tree (in blue), so if we set e.g. $v_1=0$, we can immediately infer $v_2 = -v_{12}^{\rm VS}$ and $v_3 = v_{13}^{\rm VS}$. Next, we compute the steady state of the network by performing exact coordinate descent (Theorem \ref{['thm:coordinate-descent']}) on the set of internal node electrical potentials (in black). As an example, one step of exact coordinate descent on node $k=5$ proceeds as follows. First we look at the resistors and current sources connected to node $k=5$ and we calculate $p_5 = (g_{25} v_2 + g_{58} v_8 + i_{57}^{\rm CS}) / (g_{25} + g_{58})$. Then we look at the diodes connected to node $k=5$ and we calculate $v_5 = \max(v_3, \min(p_5, v_4))$. This is the value of $v_5$ that achieves the minimum of $E(v_5)$ given other variables (node electrical potentials) fixed. We repeat the process with other nodes until convergence.
  • Figure 3: Top. A deep resistive network (DRN) with $L=3$ layers. Input voltage sources are set to input values: $v_1^{(0)} = A x_1$, $v_2^{(0)} = - A x_1$, $v_3^{(0)} = A x_2$ and $v_4^{(0)} = - A x_2$, where $A$ is the input amplification factor. At inference, output switches are open. Equilibrium propagation learning requires nudging the output node voltages ($v_1^{(3)}$, $v_2^{(3)}$$v_3^{(3)}$ and $v_4^{(3)}$) towards the target voltages ($y_1$, $y_2$, $y_3$ and $y_4$), which is achieved by closing the output switches. In the DRN architecture, the update rule for a given unit prescribed by exact coordinate descent depends only on the states of the units of the previous layer and the next layer. We can thus update the even layers ($\ell=2$) simulatneously, and then update all the odd layers ($\ell=1$ and $\ell=3$) simultaneously. This is called exact block coordinate descent. Bottom. To form a nonlinear unit, we place a diode between the unit's node and ground. Depending on the orientation of the diode, the units come in two flavours: excitatory units and inhibitory units.
  • Figure 4: Training a nonlinear resistive network with equilibrium propagation (EP). Top. Input voltage $x$ is supplied to the network, and the output voltage $v_{\rm out}$ is measured. Diodes implement nonlinearities and variable resistors implement the 'trainable weights'. Bottom. Two methods to implement 'nudging' for EP learning. In the first method (left), a voltage source set to desired output $y$, a resistor of conductance $\beta$ and a switch are in series in the output branch. Closing the switch injects a current proportional to the prediction error, $i_{\rm out} = \beta (y-v_{\rm out})$. One caveat of this method is that the nudging parameter $\beta$ (the conductance) is necessarily positive. In the second method (right), a current source injects a current $i_{\rm out} = \beta (y-v_{\rm out})$ in the output branch, allowing for the use of a negative $\beta$.
  • Figure 5: Top. Bidirectional amplifier with gain $a$. The right terminal voltage ($v_R$) is related to the left terminal voltage ($v_L$) by $v_R = a v_L$, where $a$ is a gain factor. The left terminal current ($i_L$) is related to the right terminal current ($i_R$) by $i_L = \frac{1}{a} i_R$. Middle. A unit consisting of a bidirectional amplifier, possibly followed by a diode between the unit's node and ground. Units come in two types: excitatory and inhibitory, depending on the diode's orientation. Bottom. A deep resistive network with bidirectional amplifiers.

Theorems & Definitions (8)

  • Theorem 1: Convex QP formulation
  • Theorem 2: Exact coordinate descent
  • Theorem 2: Convex QP formulation
  • proof : Proof of Theorem \ref{['thm:convex-qp-formulation']}
  • Theorem 2: Exact coordinate descent
  • proof : Proof of Theorem \ref{['thm:coordinate-descent']}
  • Theorem 3: Equilibrium propagation formulas
  • proof : Proof of Theorem \ref{['thm:ep-inequality']}