Table of Contents
Fetching ...

Machine Learning Without a Processor: Emergent Learning in a Nonlinear Electronic Metamaterial

Sam Dillavou, Benjamin D Beyer, Menachem Stern, Andrea J Liu, Marc Z Miskin, Douglas J Durian

TL;DR

It is experimentally demonstrated that nonlinearity enhances machine-learning capabilities in an analog CLLN, establishing a paradigm for scalable learning systems.

Abstract

Standard deep learning algorithms require differentiating large nonlinear networks, a process that is slow and power-hungry. Electronic learning metamaterials offer potentially fast, efficient, and fault-tolerant hardware for analog machine learning, but existing implementations are linear, severely limiting their capabilities. These systems differ significantly from artificial neural networks as well as the brain, so the feasibility and utility of incorporating nonlinear elements have not been explored. Here we introduce a nonlinear learning metamaterial -- an analog electronic network made of self-adjusting nonlinear resistive elements based on transistors. We demonstrate that the system learns tasks unachievable in linear systems, including XOR and nonlinear regression, without a computer. We find our nonlinear learning metamaterial reduces modes of training error in order (mean, slope, curvature), similar to spectral bias in artificial neural networks. The circuitry is robust to damage, retrainable in seconds, and performs learned tasks in microseconds while dissipating only picojoules of energy across each transistor. This suggests enormous potential for fast, low-power computing in edge systems like sensors, robotic controllers, and medical devices, as well as manufacturability at scale for performing and studying emergent learning.

Machine Learning Without a Processor: Emergent Learning in a Nonlinear Electronic Metamaterial

TL;DR

It is experimentally demonstrated that nonlinearity enhances machine-learning capabilities in an analog CLLN, establishing a paradigm for scalable learning systems.

Abstract

Standard deep learning algorithms require differentiating large nonlinear networks, a process that is slow and power-hungry. Electronic learning metamaterials offer potentially fast, efficient, and fault-tolerant hardware for analog machine learning, but existing implementations are linear, severely limiting their capabilities. These systems differ significantly from artificial neural networks as well as the brain, so the feasibility and utility of incorporating nonlinear elements have not been explored. Here we introduce a nonlinear learning metamaterial -- an analog electronic network made of self-adjusting nonlinear resistive elements based on transistors. We demonstrate that the system learns tasks unachievable in linear systems, including XOR and nonlinear regression, without a computer. We find our nonlinear learning metamaterial reduces modes of training error in order (mean, slope, curvature), similar to spectral bias in artificial neural networks. The circuitry is robust to damage, retrainable in seconds, and performs learned tasks in microseconds while dissipating only picojoules of energy across each transistor. This suggests enormous potential for fast, low-power computing in edge systems like sensors, robotic controllers, and medical devices, as well as manufacturability at scale for performing and studying emergent learning.
Paper Structure (13 sections, 24 equations, 9 figures)

This paper contains 13 sections, 24 equations, 9 figures.

Figures (9)

  • Figure 1: Design of the Learning Metamaterial(a) Physical response We consider a physical system, an analog resistor network, as a function. Imposed voltages $V_1$ (input) and $V_-$ (ground) create a response throughout the network, and we choose a node to measure voltage output $O$. Our network is constructed using N-channel enhancement MOSFET transistors as variable nonlinear resistors. The conductance of each edge of the network thus depends on its connected node voltages $V_A$ and $V_B$, as well as the voltage on its gate $G$. $\vec{G}$ are the learning degrees of freedom that change during the course of training, and are each stored on local capacitors with capacitance $C_0$. (b) Learning Response We duplicate this network, and overlay the copies such that they do not interact directly, but the transistors on commensurate edges draw their gate voltages $G$ from the same capacitor. This ensures two copies of the same electronic network. We designate one network as 'Free' and impose only inputs ($V_1$, $V_-$), and the other as 'Clamped' and impose inputs as well as the label (desired output) $L$. During training, circuitry on each twin edge continuously updates $G$ by charging or discharging the local capacitor, depending on the local difference between the electronic states in the two networks. (c) Image of the system The highlighted 'twin edge' is a transistor pair like the one shown schematically in (b), along with the circuitry to update $G$. This twin edge is repeated 32 times to create our system. Nodes in the network contain no circuitry and are simply connections between twin edges, and the only interaction a computer has with the network is through the custom input/output (I/O) hardware (described in Supervisor and Measurement Circuitry in the appendix, to impose boundary conditions ($V_1$, $V_-$, $L$), take measurements, and turn learning ($\dot G$) on and off. Note that the network has periodic connections, which are not pictured in the schematics in (a) and (b) for convenience. Custom I/O hardware, optical table, and all other aspects of the image except for the learning metamaterial itself are faded.
  • Figure 2: Emergent Supervised Learning (a) Physical Response. When inputs are applied, nodes in the system find electronic equilibrium on a timescale of $\tau_V \sim 1~\mu$s. In this way, the network 'calculates' the output (orange) naturally from the inputs. Note that the voltage change does affect the conductance of some edges (Eq. \ref{['conductance']}), but not the learning degrees of freedom ($\vec{G}$) which are frozen. (b) Learning Response. Enforcing an output value in the clamped network (here $O^C=L=0.31$ V) and unfreezing the evolution of the gate voltages $\vec{G}$ will allow the system to continuously change these learning degrees of freedom. They will evolve with timescale $\tau_0 =18$ ms until frozen, or until the system reaches a state where the two networks have identical voltages, that is, the labels are naturally generated from the inputs, and the task has been learned.
  • Figure 3: Learning XOR a) Schematic of the network showing input and output node locations and average rate of change in $G$ (edge color) for three time spans during training. Edge widths correspond to average conductance. The boxed region in the first panel highlights changes associated with altering the average output (reducing $|\mathcal{E}_{00}|$). b) Network output $O$ plotted as a function of inputs $V_1$ and $V_2$ in a truth-table format for four points during training. Color corresponds to output value, with $L_0 = -0.087$ V c) Mean squared error (black) and error contributions broken down by mode $|\mathcal{E}_{jk}|$ (green) over time. Error modes are depicted graphically next to lines. Time points indicated in (b) are denoted by vertical gray bars.
  • Figure 4: Nonlinear Regression a) Initial output (black line) and outputs at four points throughout training (blue lines) are shown versus variable input $V_1$ value. Training labels $L$ (black circles) are overlaid. b) Mean squared error of all datapoints (black) and error contributions broken down by mode $|\mathcal{E}_m|$ (green) over time. Time points indicated in (a) are shown as vertical blue bars. c) Schematic of the network showing input and output node locations and average rate of change in $G$ (color) for the three time spans during training defined by the blue lines in (a) and (b). Edge widths of the network correspond to average conductance of the edges. The boxed regions in the first two panels indicate interpretable changes discussed in the main text.
  • Figure 5: Error Mode Evolution for Many Tasks a) Signed error contributions from the lowest two modes $\mathcal{E}_0$ and $\mathcal{E}_1$ for 29 unique regression tasks during training. Input values and network setup are identical to Fig. \ref{['fig:3']}. Color indicates training time, and traces end at hollow black circles. Gray diagonal lines represent $|\mathcal{E}_0| = |\mathcal{E}_1|$. Green line corresponds to the experiment shown in Fig. \ref{['fig:3']}. b) Same as (a) but for linear $\mathcal{E}_1$ and curvature $\mathcal{E}_2$ modes. Inset: training data for each of these 29 tasks is 8 datapoints in a piece-wise linear form like the one shown here. Parameters $a,b,c$ are varied (see text) to produce a range of initial error modes.
  • ...and 4 more figures