Table of Contents
Fetching ...

The Coupling Strength Is a Scale Parameter in Threshold Power-Law Reservoirs and Does Not Influence Training Accuracy

Wilten Nicola

TL;DR

It is shown that for reservoirs constructed with threshold power-law transfer functions, if the reservoir can be trained for one single positive value of the initial reservoir coupling strength, then there exist networks with identical accuracy for all positive coupling strengths, implying that the chaotic dynamics can always be tamed or never be tamed.

Abstract

In reservoir computing, the coupling strength of the initial untrained recurrent neural network (the reservoir) is an important hyperparameter that can be varied for accurate training. A common heuristic is to set this parameter near the ``edge of chaos", where the untrained reservoir is near the transition to chaotic dynamics, and the chaos can be ``tamed". Here, we investigate how the overall connectivity strength should be varied in threshold power-law recurrent neural networks, where the firing rate is 0 below some threshold of the current and is a power function of the current above this threshold. These networks have been previously shown to exhibit chaotic solutions for very small coupling strengths, which may imply that the chaos cannot be tamed at all. We show that for reservoirs constructed with threshold power-law transfer functions, if the reservoir can be trained for one single positive value of the initial reservoir coupling strength, then there exist networks with identical accuracy for all positive coupling strengths, implying that the chaotic dynamics can always be tamed or never be tamed. This is a direct consequence of the coupling strength of threshold power-law RNNs acting as a scale parameter that does not qualitatively influence the dynamics of the system, but only scales all system solutions in magnitude. This is independent of the power of the transfer function, with the exception of Rectified Linear Unit (ReLU) networks. This is in contrast with conventional RNNs/reservoirs employing sigmoidal firing rates, where the strength of the recurrent coupling in the initial reservoir determines the performance on different tasks during training and also influences the network dynamics explicitly.

The Coupling Strength Is a Scale Parameter in Threshold Power-Law Reservoirs and Does Not Influence Training Accuracy

TL;DR

It is shown that for reservoirs constructed with threshold power-law transfer functions, if the reservoir can be trained for one single positive value of the initial reservoir coupling strength, then there exist networks with identical accuracy for all positive coupling strengths, implying that the chaotic dynamics can always be tamed or never be tamed.

Abstract

In reservoir computing, the coupling strength of the initial untrained recurrent neural network (the reservoir) is an important hyperparameter that can be varied for accurate training. A common heuristic is to set this parameter near the ``edge of chaos", where the untrained reservoir is near the transition to chaotic dynamics, and the chaos can be ``tamed". Here, we investigate how the overall connectivity strength should be varied in threshold power-law recurrent neural networks, where the firing rate is 0 below some threshold of the current and is a power function of the current above this threshold. These networks have been previously shown to exhibit chaotic solutions for very small coupling strengths, which may imply that the chaos cannot be tamed at all. We show that for reservoirs constructed with threshold power-law transfer functions, if the reservoir can be trained for one single positive value of the initial reservoir coupling strength, then there exist networks with identical accuracy for all positive coupling strengths, implying that the chaotic dynamics can always be tamed or never be tamed. This is a direct consequence of the coupling strength of threshold power-law RNNs acting as a scale parameter that does not qualitatively influence the dynamics of the system, but only scales all system solutions in magnitude. This is independent of the power of the transfer function, with the exception of Rectified Linear Unit (ReLU) networks. This is in contrast with conventional RNNs/reservoirs employing sigmoidal firing rates, where the strength of the recurrent coupling in the initial reservoir determines the performance on different tasks during training and also influences the network dynamics explicitly.

Paper Structure

This paper contains 8 sections, 2 theorems, 20 equations, 5 figures.

Key Result

Theorem 1

Suppose that the initial reservoir (r1) is trained on some $m$-dimensional supervisor $\bm x(t)$ with some encoder/decoder pair $\bm \eta$, $\bm \phi$ for $g = g^*$, achieving a test-loss of $L^* = L(\bm \phi,\bm \eta,g^*,\bm \omega)$. Further, assume that the threshold power-law considered is $k\ne

Figures (5)

  • Figure 1: Threshold Power-Law Recurrent Neural Networks. (A) Various transfer functions for threshold power-law neural networks, where $x_+ = \max{x,0}$. (B) A simulation of a threshold power-law RNN with $N=1000$ neurons, and $k=\frac{1}{2}$, where $\omega_{ij}$ is drawn from a standard normal distribution with a coupling strength of $g = \sqrt{N}^{-1}$. The system displays irregular dynamics indicative of a chaotic solution as predicted from dynamic mean field theoreis kadmonomri for large $N$. (C) Decreasing $g$ to $10^{-3}\sqrt{N}^{-1}$ leads to an initial decay to a neighbourhood near the origin (left), but a zoom (right) reveals small firing rate fluctuations. All other parameters are as in (B). (D) Lemma 1 shows that solutions for one value of $g$ are rescaled solutions for any other value of $g$. This is shown numerically with $g_1 = 0.8\sqrt{N}^{-1}$ and $g_2 = 1.2\sqrt{N}^{-1}$ with the transform defined in the red and green arrows to rescale the respective solutions for the two values of $g$. The initial conditions are also rescaled. All weights were drawn from a standard normal distribution.
  • Figure 2: Threshold Power-Law RNNs with Refractory Periods. (A) The firing rates for a threshold power-law RNN with $k=\frac{1}{2}$ (blue) and with a refractory period (red). The firing with a refractory period $\tau$ converges to $\tau^{-1}$ for high input currents ($z$). (B) The firing rate for a simulated network without a refractory period (red, $N=2000$, $g = \sqrt{N}^{-1}$ and with a refractory period for $g_\tau =10^{-1}\sqrt{N}^{-1}$ (blue-dashed) and $g_\tau=10^{-5} \sqrt{N}^{-1}$ (green-dashed). As $g_\tau\rightarrow 0$, the solutions of the network with a refractory period converge to solutions of the network without one. Note that the green and blue solutions were re-scaled with $\hat{z}(t) = z(t)(\frac{g_{\tau}}{g})^{1/(k-1)}$ where $g_\tau$ denotes the network coupling strength for the network with a refractory period $\tau$. (C) Identical as in (B) only with oscillatory solutions as $g_\tau\rightarrow 0$ by with $N=50$ neurons. (D) Identical as in (C), only with an equilibrium point solution as $g_\tau \rightarrow 0$. All weights were drawn from a standard normal distribution.
  • Figure 3: Threshold Power-Law RNNs can be Trained with FORCE. (A) The supervisor (black), a random sum of oscillators (Materials and Methods) vs the network approximation (red). RLS was turned on after 50 time units and turned off for the last 50 time units in a 300 time unit simulation. The network consisted of $N=2000$ neurons with $k=\frac{1}{2}$ and $g = 1$. (B) The firing rates for 4 neurons, time aligned with (A). (C) The supervisor (as in (A), but with a different random sum, black) vs networks with increasingl larger powers $k$ (reds). RLS was turned on after 50 time units and off for the last 50 time units in a 300 time unit simulation. A zoom of the last 10 time units is shown on the right. (D) The firing rates for four neurons, time-aligned with (C). Note that all 4 networks used the same initial state and the same initial reservoir weight matrix with $g=1.5\sqrt{N}^{-1}$ in all cases. A zoom of the last 10 time units is shown on the right. All weights were drawn from a standard normal distribution.
  • Figure 4: Test Error vs. Power in Trained Threshold Power-Law RNNs. The power $k$ varied from $1/100$ to $100/100$ in increments of $1/100$. For each value of $k$, 50 random networks consisting of $N=2000$ neurons were generated and FORCE trained with randomly generated oscillatory supervisors as in Figure \ref{['figure3']}. The log of the test MSE is shown as individual points (black dots) with the mean (red) and mean $\pm$ standard deviation in blue. The test error monotonically decreases until $k\approx 1$ where the networks start becoming unstable. Each network was simulated for 500 time units with RLS turned on after the first 50 time units and off for the last 50 time units (testing). All networks used a constant $g =1.5 \sqrt{N}^{-1}$. All weights were drawn from a standard normal distribution.
  • Figure 5: Trained Threshold-Power-Law Recurrent Neural Networks can be Rescaled to Any Reservoir Strength. (A) A network of $2000$ neurons with $g = \frac{1.1}{\sqrt{N}}$ was trained on a complex oscillator task with FORCE training. RLS was turned on on $50$ time units and turned off at $200$ time units. The decoded output (red) vs the target supervisor (black) is shown on top, while the firing rates for 5 neurons are shown on the bottom. (B) A network of $2000$ neurons with $g = \frac{1.1}{\sqrt{N}}$ was trained to learn the Rossler dynamical system with FORCE training (network, red, system, Rossler, black). A zoom of the last 50 time units with RLS off is shown on the right, while the full simulation is shown on on the left. (C) The firing rates for 5 randomly selected neuron. The zoom on the right is aligned with the zoom in (B). (D) The firing rates for the reservoir with $g^* = \frac{1.1}{\sqrt{N}}$ during testing. (E) The firing rates for the reservoir with $g = \frac{1.0}{\sqrt{N}}$ during testing using the encoder and decoder pair $\hat{\bm\eta}$ and $\hat{\bm \phi}$ where $\phi$ as described in the text. (F) The phase portrait of the $x_1$ vs $x_2$ variable for the network trained with $g^*$ (black) and $g$ (green). The Rossler inset is shown in red.

Theorems & Definitions (2)

  • Theorem 1
  • Lemma 1