Table of Contents
Fetching ...

Nonlinear Neural Dynamics and Classification Accuracy in Reservoir Computing

Claus Metzner, Achim Schilling, Andreas Maier, Patrick Krauss

TL;DR

The accuracy of a reservoir computer in artificial classification tasks of varying complexity is studied, finding that the accuracy peaks both at the oscillatory/chaotic and at the chaotic/fixpoint phase boundaries, thus supporting the 'edge of chaos' hypothesis.

Abstract

Reservoir computing - information processing based on untrained recurrent neural networks with random connections - is expected to depend on the nonlinear properties of the neurons and the resulting oscillatory, chaotic, or fixpoint dynamics of the network. However, the required degree of nonlinearity and the range of suitable dynamical regimes for a given task are not fully understood. To clarify these questions, we study the accuracy of a reservoir computer in artificial classification tasks of varying complexity, while tuning the neuron's degree of nonlinearity and the reservoir's dynamical regime. We find that, even for activation functions with extremely reduced nonlinearity, weak recurrent interactions and small input signals, the reservoir is able to compute useful representations, detectable only in higher order principal components, that render complex classificiation tasks linearly separable for the readout layer. When increasing the recurrent coupling, the reservoir develops spontaneous dynamical behavior. Nevertheless, the input-related computations can 'ride on top' of oscillatory or fixpoint attractors without much loss of accuracy, whereas chaotic dynamics reduces task performance more drastically. By tuning the system through the full range of dynamical phases, we find that the accuracy peaks both at the oscillatory/chaotic and at the chaotic/fixpoint phase boundaries, thus supporting the 'edge of chaos' hypothesis. Our results, in particular the robust weakly nonlinear operating regime, may offer new perspectives both for technical and biological neural networks with random connectivity.

Nonlinear Neural Dynamics and Classification Accuracy in Reservoir Computing

TL;DR

The accuracy of a reservoir computer in artificial classification tasks of varying complexity is studied, finding that the accuracy peaks both at the oscillatory/chaotic and at the chaotic/fixpoint phase boundaries, thus supporting the 'edge of chaos' hypothesis.

Abstract

Reservoir computing - information processing based on untrained recurrent neural networks with random connections - is expected to depend on the nonlinear properties of the neurons and the resulting oscillatory, chaotic, or fixpoint dynamics of the network. However, the required degree of nonlinearity and the range of suitable dynamical regimes for a given task are not fully understood. To clarify these questions, we study the accuracy of a reservoir computer in artificial classification tasks of varying complexity, while tuning the neuron's degree of nonlinearity and the reservoir's dynamical regime. We find that, even for activation functions with extremely reduced nonlinearity, weak recurrent interactions and small input signals, the reservoir is able to compute useful representations, detectable only in higher order principal components, that render complex classificiation tasks linearly separable for the readout layer. When increasing the recurrent coupling, the reservoir develops spontaneous dynamical behavior. Nevertheless, the input-related computations can 'ride on top' of oscillatory or fixpoint attractors without much loss of accuracy, whereas chaotic dynamics reduces task performance more drastically. By tuning the system through the full range of dynamical phases, we find that the accuracy peaks both at the oscillatory/chaotic and at the chaotic/fixpoint phase boundaries, thus supporting the 'edge of chaos' hypothesis. Our results, in particular the robust weakly nonlinear operating regime, may offer new perspectives both for technical and biological neural networks with random connectivity.

Paper Structure

This paper contains 30 sections, 7 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Reservoir computer and free reservoir dynamics.(a) In every time step $t$ of an ongoing computation, an input matrix $\mathbf{I}$ of size $N\!\times\!M$ couples $M$ input signals $x_m$ into the reservoir. The $N$ neurons of the reservoir, which are recurrently connected by a matrix $\mathbf{W}$ of size $N\!\times\!N$, produce neural activations $y_n$. An output matrix $\mathbf{O}$ of size $K\!\times\!N$ reads the reservoir states $y_n$ and linearly extracts from them $K$ output features $z_k$. These features are finally passed through a nonlinear argmax function (not shown) to produce a one-hot output for classification tasks. (b) Examples for the dynamics in a free-running reservoir with $N\!=\!10$ neurons, for seven different values $b$ of the excitatory/inhibitory balance and five different recurrent coupling strengths $w$ in the reservoir's connection matrix $\mathbf{W}$. For each parameter combination $(w,b)$ we show the momentary activations $y_n^{(t)}\in\left[ -1,+1\right]$ of the 10 neurons (vertical) over time (horizontal) in color coding, where blue indicates negative and red positive activations (Comp. Fig.\ref{['fig_ModelTasks']}(g)). (c),(d),(e) Three measures of reservoir dynamics as a function of the excitatory/inhibitory balance $b$, for three different recurrent coupling strengths $w=0.1$ (left), $w=0.3$ (middle) and $w=0.5$ (right). The fluctuation$F\in\left[0,1\right]$ measures how much, averaged over all 10 reservoir neurons, the individual neural activations $y_n^{(t)}$ change over time. It usually has the largest values in the oscillatory regime, medium values in the chaotic regime, and the smallest values in the fixpoint regime (blue lines). The correlation$C\in\left[-1,+1\right]$ measures the average degree of similarity between the neural activation $y_i^{(t)}$ of neuron $i$ at time $t$ and the activation $y_j^{(t\!+\!1)}$ of neuron $j$ at the following time step $t\!+\!1$, averaged over all pairs $(i,j)$. It is usually negative in the oscillatory regime, around zero in the chaotic regime, and positive in the fixpoint regime (orange lines). The nonlinearity parameter$\alpha\in\left[-1,+1\right]$ measures how strongly the reservoir neurons are driven into the saturation of the $\tanh$ activation functions and therefore produce 'digital' (-1 or +1) instead of 'analog' (continuous) outputs. The parameter $\alpha$ is close to -1 (linear, analog regime) when the distribution $p(y)$ of activations has a single peak at $y\!\approx\!0$, it is close to 0 if the distribution is distributed uniformly in the range $\left[-1,+1\right]$ (weakly nonlinear regime), and it is close to +1 (nonlinear, digital regime) if $p(y)$ has its peaks at the borders $y\!\approx\!-1$ and/or $y\!\approx\!+1$ (green lines).
  • Figure 2: Model classification tasks and states of the running RC. The 'purely spatial' discrimination tasks (a-d) have only $M\!=\!2$ simultaneous input signals $x_0\in\left[-1,+1\right]$ and $x_1\in\left[-1,+1\right]$ per episode, and thus each input vector $(x_0,x_1)$ can be represented as a point in a two-dimensional plane, with point colors representing the classes. By contrast, task (e) is a 'spatio-temporal' pattern recognition task. (a): 'Line task': Two point classes that can be separated by a line in input space. (b): 'Circle task': Two point classes that can be separated by a circle. (c): 'XOR task': Two point classes that cannot be separated by any single curve. (d): 'Patches task': Three point classes, each distributed over several patches in input space. (e): A task where, in each episode, $M\!=\!2$ simultaneous input signals are subsequently presented over $T\!=\!6$ time steps. The task requires to discriminate $C\!=\!3$ classes, and each is represented by a prototypical pattern of $M\!\cdot\!T$ ternary values $x_m^{(t)}\in\left\{-1,0,+1\right\}$ (color-coded by blue, gray and red). From each prototypical pattern, variants can be produced by random gradual modification of its characteristic values (Only two variants per prototype are shown). These modifications are done in a way that preserves the range $x_m^{(t)}\in\left[-1,+1\right]$ of input signals. Note that in task (e), the final input values are zero (gray) in each of the prototypical patterns, so that the readout-layer cannot simply use the terminal pair of input-values to discriminate the classes. The number of 'terminal zeros' in each pattern can be used to test the memory capacity of the reservoir. (f): Detailed time-dependent states of the RC during 5 consecutive input episodes. The horizontal axis represents the time steps, the vertical axis represents the input signals $x$ (top 2 rows), the activations $y$ of the reservoir neurons (middle 10 rows), and the linear outputs $z$ before argmax (bottom 3 rows). Episode boundaries are marked by vertical white lines, and the correct class labels of each input episode is written on top. Predicted labels appear in one-hot coding directly after each episode end (framed in white). (g): General color bar for matrix plots in this paper.
  • Figure 3: Reservoir dynamics with input. The top panels (a-c) show the three quantitative statistical measures $F$ (blue), $C$ (orange) and $\alpha$ (green) of the reservoir dynamics as functions of the balance $b$, for three different recurrent couplings $w$, both without external input signals (solid lines) and with continuously fed-in spatio-temporal input signals (dashed lines). (a): For weak coupling $w\!=\!0.1$, there is no significant statistical difference between the free-running and input-driven reservoir. (b): For medium coupling $w\!=\!0.3$, the correlation is not affected by the input, but the fluctuation and nonlinearity are slightly enhanced by the input around the chaotic regime. (c): For strong coupling $w\!=\!0.5$, only the fluctuation is slightly increased by the input. The bottom panels (d-f) show the time-dependent neuron activations for the strongly coupled reservoir with $w\!=\!0.5$, for three different values of the balance $b$. First row: activations without input. Second row: activations at the same times, but with spatio-temporal input signals. Third row: difference between the input-driven and free-running activations, for all reservoir neurons. Fourth row: difference for the neurons 2-9 that do not receive direct input signals. (d): In the oscillatory regime $b=-0.9$, the input causes activation differences in the neurons 2-9 that are only of order $10^{-5}$, compared to the activations themselves, which are of order unity. (e): In the chaotic regime $b=0$, the input-induced differences are considerable and of the same order than the activations themselves. (f): In the fixpoint regime $b=+0.9$, just like in the oscillatory regime, the input-induced differences are extremely small. Together, this figure demonstrates that with the exception of the sensitive chaotic regime, the input signals only represent a negligible perturbation of the ongoing reservoir dynamics.
  • Figure 4: Classification with readout, but no reservoir. The performance of the isolated readout layer is tested with four purely spatial, binary classification tasks. Top row: input data distributions $p(\mathbf{x})$; Middle row: distribution of the readout layer's linear output $p(\mathbf{z})$, before the application of the argmax function. Due to the training of one-hot outputs, the predictions $\mathbf{z}=(z_0,z_1)$ always lie on a straight line with $z_0\!+\!z_1\!=\!1$. The argmax function separates the points on this line into two discrete classes. The separating boundary is shown as a dashed black line; Bottom row: learned data distributions with achieved accuracy. (a): The 'line' task is linearly separable. Consequently, even the isolated readout layer achieves a classification accuracy of $0.97$ in this case. (b,c): The 'circle' and 'XOR' tasks are not linearly separable. Consequently, the accuracy of the isolated readout layer drops to chance level $\approx 0.5$ in these cases. (d): Although 'patches' tasks are not in general linearly separable, the readout achieves an accuracy slightly above chance level in some cases.
  • Figure 5: Effect of neural nonlinearity in the RC.(a): Modified neural activation function with a tunable linearity parameter $s$. For $s\!>\!1$ (red curve), the range of arguments $u$ where the neuron response is approximately linear is extended, compared to the regular $\tanh$ function (black curve) that corresponds to $s\!=\!1$. (b): Accuracy of a balanced ($b\!=\!0$), weakly coupled ($w\!=\!0.1$) reservoir computer (blue curve), the statistical measures $F$, $C$ and $\alpha$, as well as the root-mean-square of all neural activations (magenta curve), plotted as functions of the linearity parameter $s$ over ten orders of magnitude. The accuracy is well beyond 0.9 for a remarkably large range of $s$. Quantities $F$, $C$, $\alpha$ and the RMS activation indicate a 'calm' reservoir, with small neural signal amplitudes, due to the small recurrent coupling. (c): Time-dependent activations in the weakly coupled, balanced reservoir during the circle task. Top: Case of standard $\tanh$ activation functions with $s\!=\!1$. Middle: Case of linear neurons. Bottom: Difference of the latter two, which is only of order $10^{-3}$. The panels (d-g) show the distributions of different PCA components of the reservoir activations, with classes in color coding, both for linear neurons (left of each pair) and $\tanh$ neurons (right of each pair). The PCA components 0-2 do not allow for linear separation of the classes, and the distributions of these components are very similar for linear and $\tanh$ neurons (d,e). Small differences become visible in PCA component 3. For the $\tanh$ neurons, the distributions of the two classes are slightly shifted along this axis (f, right plot of the pair). Drastic differences between the neuron types appear in the combination of PCA components 3 and 4. Here, the classes are linear separable for $\tanh$ neurons, while they completely overlap for linear neurons. Together, these results indicate that the effect of nonlinearity is crucial for RC performance, but very subtle.
  • ...and 1 more figures