Table of Contents
Fetching ...

Understanding and Embracing Imperfection in Physical Learning Networks

Sam Dillavou, Marcelo Guzman, Andrea J. Liu, Douglas J. Durian

Abstract

Performing machine learning with analog signals offers advantages in speed and energy efficiency, but sensitivity to component and measurement imperfections often foils training without a system-specific companion digital model. Here we take a different perspective, accepting and characterizing these inherent imperfections and ultimately overcoming them without digital models. We train an analog network of self-adjusting resistors -- a contrastive local learning network -- for multiple tasks, and observe limit cycles and scaling behaviors that limit precision, erase memory of previous tasks, and are absent in `perfect' systems. We develop an analytical model capturing these phenomena as a consequence of an uncontrolled learning bias continuously modifying the underlying representation of learned tasks, reminiscent of representational drift in the brain. Finally, we introduce and demonstrate a system-agnostic training method that greatly suppresses these effects. Our work points to a new, scalable analog approach that eschews precise modeling and instead thrives in the mess of real systems.

Understanding and Embracing Imperfection in Physical Learning Networks

Abstract

Performing machine learning with analog signals offers advantages in speed and energy efficiency, but sensitivity to component and measurement imperfections often foils training without a system-specific companion digital model. Here we take a different perspective, accepting and characterizing these inherent imperfections and ultimately overcoming them without digital models. We train an analog network of self-adjusting resistors -- a contrastive local learning network -- for multiple tasks, and observe limit cycles and scaling behaviors that limit precision, erase memory of previous tasks, and are absent in `perfect' systems. We develop an analytical model capturing these phenomena as a consequence of an uncontrolled learning bias continuously modifying the underlying representation of learned tasks, reminiscent of representational drift in the brain. Finally, we introduce and demonstrate a system-agnostic training method that greatly suppresses these effects. Our work points to a new, scalable analog approach that eschews precise modeling and instead thrives in the mess of real systems.

Paper Structure

This paper contains 14 sections, 34 equations, 6 figures.

Figures (6)

  • Figure 1: Training for Two TasksA Image of the Contrastive Local Learning Network (CLLN) decorated with chosen input (yellow) and output (white) node locations. The network is a grid of self-adjusting nonlinear resistors -- specifically n-channel enhancement MOSFET transistors -- and nodes are simply connections between edges. Two copies of the network are constructed (they coexist on the same breadboards) to run the Free and Clamped states (see text). The system output is the difference $O = O_+-O_-$ between two chosen nodes, and variable inputs $V_1$ and $V_2$, their inverses $V_i' = V_+-V_i$, and two constant values $V_+ = 0.435$ V and $V_- = 0.018$ V are imposed as voltage boundary conditions. The differential output and inverted inputs both give the system access to negative relationships between input and output, permitting more complex functionality. B Training protocol. Two tasks $\alpha$ (blue) and $\beta$ (red) are alternatively trained for time $\tau/2$ until reaching a steady state. C Binary classification. We treat each class as a separate task. Background color is system output, labels $L_\pm =\pm$63 mV, and circles are training data. Steady state results are shown for $t=\tau/2$ and $t=\tau$ for $\tau=$ 0.2 s and 3 s. D Average error for each task (class) $E_\alpha$, $E_\beta$ over time in steady state. E Regression with two data points, separated into two tasks. $V_2$ is held at $V_+/2$ for this task. Labels are squares, output is black line, and panels are paired as in C. F Limit cycle error as in D. G Combined steady state error $\overline E$ vs $\tau$ for classification (circles) and regression (squares) experiments. Example times from C-F are indicated. H Same as G but for the cycle span $D = \lVert \mathbf{G}(\tau/2) - \mathbf{G}(\tau) \rVert$, that is, the straight-line distance the system travels in a half-cycle.
  • Figure 2: Learning Dynamics in a Small Experimental NetworkA Schematic of 2+1 edge experiment. B, C Pairs of single data point tasks, differing only by the desired output for task $\beta$. Solid, dashed gray lines connect data points in B,C. D Combined steady state error $\bar{E}$ as a function of $\tau$ for task pairs in B (circles) and C (squares). Vertical lines indicate $\tau$ values shown in F and G. Model predictions with no free parameters are shown as solid black lines. E Same as D but for Cycle Span $D$. F, G Visualization of limit cycles in $G$-space for task pairs in B and C, respectively. Colored points are experimental data for periods $\tau=$ 20 (lower left) and 800 ms (large bow-tie). Faded red and blue arrows indicate cycle direction. Colored dotted lines are individual task solution lines and their intersection is the joint solution $\mathbf{G ^*}$. Gray arrows indicate measured bias $\mathbf{B}$. Thick black lines indicate the parameter maximum $G_{\text{max}}$. Note that edge $G_1$ has been modified to induce a different bias for task C, see Materials and Methods for details.
  • Figure 3: Incompatible Tasks a Small Experimental NetworkA inset A single data point ($\beta$, red) and a two-data point ($\alpha$, blue) task. A Combined steady state error $\overline E$ as a function of $\tau$. Model prediction with no free parameters is shown as a solid black line. Dashed black line indicates model prediction for zero bias ('perfect') evolution. Vertical lines highlight $\tau$ values shown in C. B Cycle span $D$ as a function of $\tau$ with model predictions as in A. C Visualization of four limit cycles in $G$-space for ($\tau=$ 25 ms, 100 ms, 400 ms, and 1600 ms). Gray arrow indicates measured bias $\mathbf{B}$. Faded red and blue arrows indicate cycle direction. The red dotted line indicates the solution space of task $\beta$, the blue dotted lines the solution spaces of each data point of task $\alpha$, and their intersection is the solution for the full $\alpha$ task. The X and Y axes for all panels have equal scale, and all Y axes have the same limits.
  • Figure 4: Standard vs Overclamping Schematic example of standard (left, gray) vs overclamping (right, yellow) for a single datapoint. Standard clamping uses $\eta=1$, which results in clamped output $O^C = L$ (label). This technique (with any $\eta$) does not result in the free output settling at the label ($O^F \rightarrow L$) because the signal ($O^C-O^F$) decays with the error $\delta = L-O^F$, and eventually bias dominates. In contrast, overclamping sets the key difference $O^C-O^F$ to an approximate constant, and reduces train step time $t_h$ proportional to the error. This reduces bias with the error signal, and suppresses its effects.
  • Figure 5: Overclamping with Full Experimental NetworkA Classification results for the two methods. Background color is output after training, dots are training data. Both methods use cycle time $\tau = 200~\mu$s, further details in Materials and Methods. The data is divided differently in each row, and columns are in order of increasing input variation $\epsilon$. B Classification accuracy for each method as a function of input variation $\epsilon$. Each data point is an average of the 8 experiments of the same task with rotated labels. Error bars are standard error. C Hinge loss as a function of input variation. Averaging same as in B. Log axis breaks to include 0 (black line).
  • ...and 1 more figures