Table of Contents
Fetching ...

Remembrance of Tasks Past in Tunable Physical Networks

Purba Chatterjee, Marcelo Guzman, Andrea J. Liu

TL;DR

It is shown that a hard threshold in the learning rule can significantly enhance memory of previous tasks by introducing a hard threshold in the learning rule, allowing only edges with sufficiently large training signals to be altered.

Abstract

Sequential learning in physical networks is hindered by catastrophic forgetting, where training a new task erases solutions to earlier ones. We show that we can significantly enhance memory of previous tasks by introducing a hard threshold in the learning rule, allowing only edges with sufficiently large training signals to be altered. Thresholding confines tuning to the spatial vicinity of inputs and outputs for each task, effectively partitioning the network into weakly overlapping functional regions. Using simulations of tunable resistor networks, we demonstrate that this strategy enables robust memory of multiple sequential tasks while reducing the number of edges and the overall tuning cost. Our results hint at constrained training as a simple, local, and scalable mechanism to overcome catastrophic forgetting in tunable matter.

Remembrance of Tasks Past in Tunable Physical Networks

TL;DR

It is shown that a hard threshold in the learning rule can significantly enhance memory of previous tasks by introducing a hard threshold in the learning rule, allowing only edges with sufficiently large training signals to be altered.

Abstract

Sequential learning in physical networks is hindered by catastrophic forgetting, where training a new task erases solutions to earlier ones. We show that we can significantly enhance memory of previous tasks by introducing a hard threshold in the learning rule, allowing only edges with sufficiently large training signals to be altered. Thresholding confines tuning to the spatial vicinity of inputs and outputs for each task, effectively partitioning the network into weakly overlapping functional regions. Using simulations of tunable resistor networks, we demonstrate that this strategy enables robust memory of multiple sequential tasks while reducing the number of edges and the overall tuning cost. Our results hint at constrained training as a simple, local, and scalable mechanism to overcome catastrophic forgetting in tunable matter.

Paper Structure

This paper contains 3 sections, 5 equations, 9 figures.

Figures (9)

  • Figure 1: Sequential learning of two edge-coupling tasks. (a) Network of $N=256$ nodes and $N_E=704$ edges, trained for two sequential edge-coupling tasks $A$ (orange) and $B$ (blue). Source nodes are hollow circles and target nodes are solid circles. (b-d) From top to bottom: Errors $\mathcal{E}_A$, $\mathcal{E}_B$, and joint error $\mathcal{E}$ as a function of training steps $T$ for different thresholds $\lambda$ (color). The lowest final joint error is found for $\lambda =0.0014$.
  • Figure 2: Increasing the threshold localizes effects of training. (a) Network with $N=1024$ nodes and $N_E=2824$ edges trained for an edge-coupling task. The width of each edge is proportional to its final conductance. A low-conductance boundary separates sectors of positive (blue nodes, $V=0.5$) and negative (yellow nodes, $V=-0.5$) voltages. (b-c) Snapshots of the network at the end of training for different thresholds. Edges that changed their conductance during training are shown in maroon. For $\lambda=0$ every edge changes. The number of altered edges decreases with increasing threshold $\lambda$.
  • Figure 3: Effects of threshold on a single edge-coupling task. Median values for the total number of altered edges $\Delta N_E$ (a) and error $\mathcal{E}$ (b) as a function of $\lambda$, for a network of $N=1024$ nodes and $N_E=2824$ edges. Color encodes the distance $D$ between source and target nodes. Training fails for thresholds higher than a maximum value $\lambda_{max}$, indicated by dashed lines in (b) for $D=14.5$ (blue) and $D=18.5$ (green). (c) Maximum threshold $\lambda_{max}$ versus distance $D$. Each dot corresponds to a trained network. (d) $\Delta N_E$ versus $D$ for different $\lambda$, as indicated by color. Error bars in (a,b,d) show first and third quartiles over $100$ realizations with randomly selected source-target pairs.
  • Figure 4: A closer look at the tasks shown in Fig. \ref{['fig:twotasks_specific']}. (a,b) Final networks after sequential training of tasks A and B for $\lambda=0$ and $\lambda = 0.002$. Edges are colored blue if altered during the training of B, orange if altered only during A, and gray if unchanged. (c,d) Joint error $\mathcal{E}$ and the total number of altered edges $\Delta N_E$ versus $\lambda$. For an intermediate range of $\lambda$ (shaded) the joint error is strongly reduced while adjusting only $15-20\%$ of the edges. (e) Median voltage response $\mathcal{R}_A$ (desired value $\Delta =1$) and (f) joint error $\mathcal{E}$ after full sequential training as a function of $\lambda$ for different distances $D_B$ (color) and fixed $D_A=4$. Error bars show first and third quartiles over $100$ realizations with randomly selected sources and targets for task $B$ and fixed task $A$.
  • Figure 5: (a) Median joint error $\mathcal{E}$ as a function of $\lambda$ after training for three sequential edge coupling tasks $A$, $B$, and $C$ in a network with $N=256$ nodes and $N_E=704$ edges (inset). The joint error, $\mathcal{E}=(\mathcal{E}_A+\mathcal{E}_B+\mathcal{E}_C)/3$, reaches a minimum at $\lambda=0.002$. Error bars represent the first and third quartiles computed from $100$ realizations with randomly chosen source-target pairs for each task. (b) Median joint error $\mathcal{E}$ as a function of $\lambda$ after training sequentially for two linear regression tasks $A$ and $B$ (inset) in a network with $N=512$ nodes and $N_E=1405$ edges. The lowest error is found at $\lambda = 0.0001$. Error bars represent the first and third quartiles computed from $100$ realizations, each with initial conductances drawn from a normal distribution with mean $1$ and standard deviation $0.05$.
  • ...and 4 more figures