Reservoir Computing Benchmarks: a tutorial review and critique
Chester Wringe, Martin Trefzer, Susan Stepney
TL;DR
The paper addresses the lack of standardized RC benchmarking by clarifying benchmark definitions and evaluating how benchmarks are used to compare RC systems. It provides a taxonomy of benchmark tasks across imitation, prediction, computation, classification, and property measures, and critiques representative tasks like NARMA, channel equalisation, Mackey-Glass, Lorenz, and MNIST within RC. It also proposes best practices for benchmarking, including dataset choices, experimental methodology, evaluation metrics, and reporting standards, and discusses pitfalls such as overfitting to specific sequences and the absence of a unifying benchmark suite. By linking task-based benchmarks to substrate behaviour spaces through concepts like CHARC, the paper offers a path toward more rigorous, comparable RC evaluations with broader interpretability and cross-study reproducibility.
Abstract
Reservoir Computing is an Unconventional Computation model to perform computation on various different substrates, such as recurrent neural networks or physical materials. The method takes a 'black-box' approach, training only the outputs of the system it is built on. As such, evaluating the computational capacity of these systems can be challenging. We review and critique the evaluation methods used in the field of reservoir computing. We introduce a categorisation of benchmark tasks. We review multiple examples of benchmarks from the literature as applied to reservoir computing, and note their strengths and shortcomings. We suggest ways in which benchmarks and their uses may be improved to the benefit of the reservoir computing community.
