Separation capacity of linear reservoirs with random connectivity matrix
Youness Boutaib
TL;DR
The paper addresses how well random linear reservoirs separate distinct input time series by linking separation to the spectral properties of generalized moment matrices derived from the random connectivity. It develops a rigorous framework showing that, for Gaussian W, the separation capacity is governed by the eigenstructure of $B_{T}$ in 1D and $B_{T,N}$ in higher dimensions, with detailed results for symmetric versus IID cases and precise scaling laws (notably $\sigma \sim 1/\sqrt{N}$). It provides both asymptotic spectral guarantees (via connections to the semicircle law in the symmetric case and related limits in the IID case) and probabilistic separation bounds (root-based and concentration-based), then discusses implications for reservoir design and task-performance, including how to balance separation against robustness as input length grows. The findings offer theoretical justification for empirical scaling heuristics and guide practical choices of reservoir size $N$, time horizon $T$, and connectivity scaling, while outlining open problems around eigenvector dynamics, nonlinearity effects, and optimization of hyperparameters.
Abstract
A natural hypothesis for the success of reservoir computing in generic tasks is the ability of the untrained reservoir to map different input time series to separable reservoir states - a property we term separation capacity. We provide a rigorous mathematical framework to quantify this capacity for random linear reservoirs, showing that it is fully characterised by the spectral properties of the generalised matrix of moments of the random reservoir connectivity matrix. Our analysis focuses on reservoirs with Gaussian connectivity matrices, both symmetric and i.i.d., although the techniques extend naturally to broader classes of random matrices. In the symmetric case, the generalised matrix of moments is a Hankel matrix. Using classical estimates from random matrix theory, we establish that separation capacity deteriorates over time and that, for short inputs, optimal separation in large reservoirs is achieved when the matrix entries are scaled with a factor $ρ_T/\sqrt{N}$, where $N$ is the reservoir dimension and $ρ_T$ depends on the maximum input length. In the i.i.d.\ case, we establish that optimal separation with large reservoirs is consistently achieved when the entries of the reservoir matrix are scaled with the exact factor $1/\sqrt{N}$, which aligns with common implementations of reservoir computing. We further give upper bounds on the quality of separation as a function of the length of the time series. We complement this analysis with an investigation of the likelihood of this separation and its consistency under different architectural choices.
