Improving the Performance of Echo State Networks Through State Feedback

Peter J. Ehlers; Hendra I. Nurdin; Daniel Soh

Improving the Performance of Echo State Networks Through State Feedback

Peter J. Ehlers, Hendra I. Nurdin, Daniel Soh

TL;DR

This work introduces a simple yet powerful state-feedback mechanism for Echo State Networks by feeding a linear function of the reservoir state back into the input, effectively altering the reservoir dynamics without modifying the reservoir itself. The authors prove that, for almost all reservoirs and data, this feedback decreases the training cost and, on average across the ESN class, yields superior performance compared to traditional ESNs. They develop a gradient-based procedure to optimize the feedback coefficients under a convergence constraint, and demonstrate substantial empirical improvements across Mackey-Glass, nonlinear channel equalization, and Coupled Electric Drives benchmarks, often achieving results comparable to larger reservoirs. The approach has practical appeal for physical reservoir computing and offers a principled pathway to leverage simple external feedback to boost modeling power for sequential tasks, while highlighting nonconvex optimization as an area for further methodological advancement.

Abstract

Reservoir computing, using nonlinear dynamical systems, offers a cost-effective alternative to neural networks for complex tasks involving processing of sequential data, time series modeling, and system identification. Echo state networks (ESNs), a type of reservoir computer, mirror neural networks but simplify training. They apply fixed, random linear transformations to the internal state, followed by nonlinear changes. This process, guided by input signals and linear regression, adapts the system to match target characteristics, reducing computational demands. A potential drawback of ESNs is that the fixed reservoir may not offer the complexity needed for specific problems. While directly altering (training) the internal ESN would reintroduce the computational burden, an indirect modification can be achieved by redirecting some output as input. This feedback can influence the internal reservoir state, yielding ESNs with enhanced complexity suitable for broader challenges. In this paper, we demonstrate that by feeding some component of the reservoir state back into the network through the input, we can drastically improve upon the performance of a given ESN. We rigorously prove that, for any given ESN, feedback will almost always improve the accuracy of the output. For a set of three tasks, each representing different problem classes, we find that with feedback the average error measures are reduced by $30\%-60\%$. Remarkably, feedback provides at least an equivalent performance boost to doubling the initial number of computational nodes, a computationally expensive and technologically challenging alternative. These results demonstrate the broad applicability and substantial usefulness of this feedback scheme.

Improving the Performance of Echo State Networks Through State Feedback

TL;DR

Abstract

. Remarkably, feedback provides at least an equivalent performance boost to doubling the initial number of computational nodes, a computationally expensive and technologically challenging alternative. These results demonstrate the broad applicability and substantial usefulness of this feedback scheme.

Paper Structure (19 sections, 6 theorems, 59 equations, 6 figures, 2 tables)

This paper contains 19 sections, 6 theorems, 59 equations, 6 figures, 2 tables.

Introduction
Theory of Reservoir Computing with Feedback
Reservoir Computing and Echo State Networks
ESNs with Feedback
Universal Superiority of ESN with Feedback over ESN without Feedback
Preliminary Definitions and Relations
Lemmas for Proving the Lower Dimensionality of Cases where TEXT
Proving the Universal Superiority of ESNs with Feedback
Superiority of ESNs with Feedback for the Whole Class of ESNs
Optimization of ESN with Feedback
Benchmark Test Results
Results on the Mackey-Glass task
Results on the Channel Equalization Task
Node Dependence
Gradient Descent Step Dependence
...and 4 more sections

Key Result

Theorem 1

For any given matrix $A$ and vector $B$ in Eq. eq:ESN-model, and given sets of training inputs $\{u_k\}=\{u_k\}_{k=1,\ldots,N}$ and outputs $\{y_k\}=\{y_k\}_{k=1,\ldots,N}$ of finite length, define an optimized cost function $S_\mathrm{min} (A,B, \{u_k\}, \{y_k\})$ with appropriate optimal $W$ and $ Moreover, if $A$ is such that $A^{\top}A < a^2 \mathbb{I}_{n}$, where $a$ is a constant that guaran

Figures (6)

Figure 1: 3D plot of the non-convex dependence of the NMSE on $V$ for an ESN with 2 computational modes. Here we optimize $C$ and $W$ for the Mackey-Glass task and use 1000 training data points after 500 initial steps in our ESN. This plot shows only a portion of the full space of convergent feedback vectors to better illustrate the non-convexity.
Figure 2: Histograms for the NMSE values for the Mackey-Glass task. The left plot shows the NMSE values for ESNs with 10 computational nodes, with 48000 randomly chosen ESNs without feedback (the base ESN) and 9600 choices with feedback. For the feedback optimization, we used 100 steps of batch gradient descent with a learning rate of $25.0$. The right plot uses 48000 randomly chosen ESNs with 100 nodes. In all cases we used 1000 training data points taken after 500 steps of startup, and we show the NMSE values for 500 test data steps after training ends.
Figure 3: Histograms for the errors in the Channel Equalization task. The left plot shows the number of errors for ESNs with 10 computational nodes, with 48000 randomly chosen ESNs without feedback (the base ESN) and 9600 choices with feedback. For the feedback optimization, we used 100 steps of batch gradient descent with a learning rate of $10.0$. The right plot uses 48000 randomly chosen ESNs with 100 nodes. In all cases we used 1000 training data points taken after 500 steps of startup, and we show the number of errors for 500 test data steps after training ends. The errors are counted using $\mathrm{abs}(d_k-\hat{y}_k) / 2$, where $d_k$ is the actual value of the signal at time step $k$, while $\hat{y}_k$ is the prediction by the ESN.
Figure 4: Plots of the average NMSE and error values for the Mackey-Glass and Channel Equalization tasks, respectively, as a function of computational nodes. The average is taken over 9600 randomly chosen ESNs, and the error bars represent the standard deviation of the distribution for each number of nodes. In all cases we used 1000 training data points taken after 500 steps of startup, and we show the NMSE and total error values for 500 test data steps after training ends. We also include a line showing the average NMSE and total errors for 10 nodes with feedback for reference.
Figure 5: Plots of the average NMSE and error values for the Mackey-Glass and Channel Equalization tasks, respectively, as a function of batch gradient descent steps for optimizing $V$, with a learning rate of $25.0$ for Mackey-Glass and $10.0$ for Channel Equalization. The average is taken over 9600 randomly chosen ESNs, and the error bars represent the standard deviation of the distribution for each number of nodes. In all cases we used 1000 training data points taken after 500 steps of startup. These plots shows the number of errors for the training data. The asterisk in the second plot indicates that we have rescaled the average number of errors by a factor of $1/2$ to be directly comparable with the other figures in this work.
...and 1 more figures

Theorems & Definitions (12)

Theorem 1: Superiority of feedback for a given ESN and training data
Lemma 1: Categorization of cases where a derivative of $S_{\mathrm{min}}$ w.r.t. a general reservoir parameter $\theta$ vanishes
proof
Lemma 2: Lower dimensionality of cases where $\nabla_V S_{\mathrm{min}} = \mathbf{0}$ while $\nabla_V \Pi_x \neq \mathbf{0}$
proof
Lemma 3: Lower dimensionality of cases where $\nabla_V \Pi_x = \mathbf{0}$
proof
Lemma 4: Lower dimensionality of the subdomain of $S_{\mathrm{min}}(A+B V^\top,B,\{u_k\}, \{y_k\})$ for which $\nabla_V S_{\mathrm{min}} = \mathbf{0}$
proof
proof : Proof of Theorem \ref{['thm1']}
...and 2 more

Improving the Performance of Echo State Networks Through State Feedback

TL;DR

Abstract

Improving the Performance of Echo State Networks Through State Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (12)