Table of Contents
Fetching ...

Reservoir Computing Benchmarks: a tutorial review and critique

Chester Wringe, Martin Trefzer, Susan Stepney

TL;DR

The paper addresses the lack of standardized RC benchmarking by clarifying benchmark definitions and evaluating how benchmarks are used to compare RC systems. It provides a taxonomy of benchmark tasks across imitation, prediction, computation, classification, and property measures, and critiques representative tasks like NARMA, channel equalisation, Mackey-Glass, Lorenz, and MNIST within RC. It also proposes best practices for benchmarking, including dataset choices, experimental methodology, evaluation metrics, and reporting standards, and discusses pitfalls such as overfitting to specific sequences and the absence of a unifying benchmark suite. By linking task-based benchmarks to substrate behaviour spaces through concepts like CHARC, the paper offers a path toward more rigorous, comparable RC evaluations with broader interpretability and cross-study reproducibility.

Abstract

Reservoir Computing is an Unconventional Computation model to perform computation on various different substrates, such as recurrent neural networks or physical materials. The method takes a 'black-box' approach, training only the outputs of the system it is built on. As such, evaluating the computational capacity of these systems can be challenging. We review and critique the evaluation methods used in the field of reservoir computing. We introduce a categorisation of benchmark tasks. We review multiple examples of benchmarks from the literature as applied to reservoir computing, and note their strengths and shortcomings. We suggest ways in which benchmarks and their uses may be improved to the benefit of the reservoir computing community.

Reservoir Computing Benchmarks: a tutorial review and critique

TL;DR

The paper addresses the lack of standardized RC benchmarking by clarifying benchmark definitions and evaluating how benchmarks are used to compare RC systems. It provides a taxonomy of benchmark tasks across imitation, prediction, computation, classification, and property measures, and critiques representative tasks like NARMA, channel equalisation, Mackey-Glass, Lorenz, and MNIST within RC. It also proposes best practices for benchmarking, including dataset choices, experimental methodology, evaluation metrics, and reporting standards, and discusses pitfalls such as overfitting to specific sequences and the absence of a unifying benchmark suite. By linking task-based benchmarks to substrate behaviour spaces through concepts like CHARC, the paper offers a path toward more rigorous, comparable RC evaluations with broader interpretability and cross-study reproducibility.

Abstract

Reservoir Computing is an Unconventional Computation model to perform computation on various different substrates, such as recurrent neural networks or physical materials. The method takes a 'black-box' approach, training only the outputs of the system it is built on. As such, evaluating the computational capacity of these systems can be challenging. We review and critique the evaluation methods used in the field of reservoir computing. We introduce a categorisation of benchmark tasks. We review multiple examples of benchmarks from the literature as applied to reservoir computing, and note their strengths and shortcomings. We suggest ways in which benchmarks and their uses may be improved to the benefit of the reservoir computing community.
Paper Structure (74 sections, 20 equations, 14 figures, 9 tables, 3 algorithms)

This paper contains 74 sections, 20 equations, 14 figures, 9 tables, 3 algorithms.

Figures (14)

  • Figure 1: (a) An example classical ESN with $7$ nodes. This ESN takes a 3-d vector of inputs $\mathbf{u}$, which are sent to the inner state $\mathbf{x}$ through weighted edges $\mathbf{W}_u$. The weights within the inner state, $\textbf{W}$, are recurrent and randomly set. The 2-d output vector $\mathbf{v}$ receives the inner state through trained edges $\mathbf{W}_v$. (b) An abstract representation of the different components of a general ESN.
  • Figure 2: Training and testing stages of an imitation benchmark. The reservoir RC is fed the same time series input $\mathbf{u}$ as the target dynamical system DS. In the training stage, the output weights $\mathbf{W}_v$ of the reservoir are trained such that the resulting reservoir output $\mathbf{v}$ resembles the target dynamical system output $\hat{\mathbf{v}}$, the Normalised Root Mean Square Error (NRMSE, see section \ref{['sec:NRMSE']}) can be used as an evaluation of the training. In the testing stage, the trained reservoir output weights are used, and we evaluate the reservoir using the error between the observed reservoir output $\mathbf{v}$ and the target dynamical system output $\hat{\mathbf{v}}$.
  • Figure 3: Training and testing stages of a prediction benchmark. During the training stage (a), the reservoir is given as inputs the target outputs of the dynamical system. The reservoir output weights are trained such that the reservoir outputs resemble the target outputs of the dynamical system. There are two cases for testing. (a) Driven: During the testing stage, the reservoir is again given the target outputs of the dynamical system. The reservoir outputs are compared to the target outputs of the dynamical system. (b) Free-running: During the testing stage, the reservoir is fed back its own output. The reservoir outputs are compared to the target outputs of the dynamical system.
  • Figure 4: 200 timesteps of NARMA-5, NARMA-10, NARMA-20 (with tanh), and NARMA-30, using the starred parameter values from table \ref{['table:narma-equations']}, and the same input stream $u(t)$ drawn from $U[0,0.5]$.
  • Figure 5: A channel signal (bottom, blue), and the signal with nonlinear noise applied (top, orange).
  • ...and 9 more figures