Linear Reservoir: A Diagonalization-Based Optimization

Romain de Coudenhove; Yannis Bendi-Ouis; Anthony Strock; Xavier Hinaut

Linear Reservoir: A Diagonalization-Based Optimization

Romain de Coudenhove, Yannis Bendi-Ouis, Anthony Strock, Xavier Hinaut

TL;DR

Both methods preserve predictive accuracy while offering significant computational speedups, making them a replacement of standard Linear ESNs computations and training, and suggesting a shift of paradigm in linear ESN towards the direct selection of eigenvalues.

Abstract

We introduce a diagonalization-based optimization for Linear Echo State Networks (ESNs) that reduces the per-step computational complexity of reservoir state updates from O(N^2) to O(N). By reformulating reservoir dynamics in the eigenbasis of the recurrent matrix, the recurrent update becomes a set of independent element-wise operations, eliminating the matrix multiplication. We further propose three methods to use our optimization depending on the situation: (i) Eigenbasis Weight Transformation (EWT), which preserves the dynamics of standard and trained Linear ESNs, (ii) End-to-End Eigenbasis Training (EET), which directly optimizes readout weights in the transformed space and (iii) Direct Parameter Generation (DPG), that bypasses matrix diagonalization by directly sampling eigenvalues and eigenvectors, achieving comparable performance than standard Linear ESNs. Across all experiments, both our methods preserve predictive accuracy while offering significant computational speedups, making them a replacement of standard Linear ESNs computations and training, and suggesting a shift of paradigm in linear ESN towards the direct selection of eigenvalues.

Linear Reservoir: A Diagonalization-Based Optimization

TL;DR

Abstract

Paper Structure (30 sections, 6 theorems, 49 equations, 7 figures, 2 tables, 3 algorithms)

This paper contains 30 sections, 6 theorems, 49 equations, 7 figures, 2 tables, 3 algorithms.

Introduction
Linear Echo State Networks: Definition
Reservoir step
Readout step
Leaking Rate
Learning $W_\text{out}$
Computational Complexity
Diagonalization-Based Optimization
Core Transformation
A special case of a diagonalizable matrix
Apply $W_{\text{in}}$ after Reservoir Steps
Application: Diagonal Linear ESN
Computational Advantages
Methods
Training Data Preparation
...and 15 more sections

Key Result

Theorem 1

Change-of-basis of the ESN dynamics (i) Let $W$ be the reservoir matrix and $P \in \text{GL}_n(\mathbb{C})$ a basis. Let us denote as $[\cdot]_{_P}$ the transformation into the basis $P$, and express both weights and state of the ESN into this new basis. (ii) Then the reservoir step becomes: (iii) And the readout step becomes: (iv) Using the notation of $X(t) = $, we define: It is then possibl

Figures (7)

Figure 1: Standard architecture of an Echo State Network (ESN). In the Reservoir Computing paradigm, the input weights $W_{in}$, the internal recurrent weights $W$, and the optional feedback weights $W_{fb}$ are randomly initialized and kept fixed. The input is projected into a high-dimensional space within the reservoir, which utilizes its recurrent connections to maintain a deterministic state representation over time. Only the readout weights $W_{out}$ are trained to decode this internal state and produce the final output.
Figure 2: Comparison of standard computation and proposed optimizations for different reservoir sizes. (i) Generation Step: Evaluates three initialization methods. Normal generates a standard linear reservoir with a weight matrix $W$. Diagonalization generates a standard $W$ and then diagonalizes it (applicable to EWT/EET in section \ref{['sec:methods']}). DPG directly generates a diagonal matrix (see section \ref{['sec:methods']}). (ii) Reservoir Step: Compare the standard computation with the Diagonal one. Because EWT, EET, and DPG share an identical diagonal structure after the generation step, they are represented together as Diagonal. (iii) Readout Step: Displays only a single curve because the computational cost is identical across all methods. With the implementation proposed in Appendix \ref{['app:implementation']}, the training of the readout can be performed with real matrices, equating the readout cost of the standard method. (Note) For reservoir and readout steps, the duration displayed is for a single time step. To evaluate the total duration over the whole sequence, these values must be multiplied by the total number of time steps.
Figure 3: Comparison of eigenvalue distributions in the complex plane. On the first column (left) the spectrum derived from a standard random reservoir matrix $W$. On the second column the spectrum generated via Uniform Distribution (Algorithm \ref{['alg:gen_eigenvalues']}). On the third column the spectrum generated via the deterministic Golden Distribution method (Algorithm \ref{['alg:golden_eigenvalues']}) without noise, and on the fourth column the specturm generated via the Noisy Golden Distribution. We observe that the Noisy Golden distribution (right) achieves a significantly more homogeneous coverage of the unit disk than the Uniform Distribution or its non-noisy counterpart, effectively matching the spectral density of the standard reservoir (left).
Figure 4: Illustration of the Multiple Superimposed Oscillators (MSO) time series for the $K=5$ task (MSO5). The target signal $U_5(t)$ is generated by summing five distinct sinusoidal components. The complete sequence of 1000 time steps is partitioned into a training part of 400 steps (blue) which includes an initial 100-step washout used to discard transient reservoir dynamics (dashed blue), a validation of 300 steps (green) for hyperparameter tuning, and a final test of 300 steps (pink) to evaluate predictive performance.
Figure 5: Visualization of spectral importance via readout weights on the MSO task. The reservoir eigenvalues are plotted in the complex plane, where the marker size is proportional to the absolute value of the corresponding weight in $W_{\text{out}}$ (normalized between 0 and 1). Larger points identify the specific eigenvalues that contribute most significantly to the model's prediction for this task. Points are the same on the left and on the right, just the size of points is changing: on the right, points close to the circle are non longer visible which may create an illusion that the point distribution was contracted, but that's not the case.
...and 2 more figures

Theorems & Definitions (6)

Theorem 1
Corollary 2
Lemma 3
Lemma 4
Theorem 5
Theorem 6

Linear Reservoir: A Diagonalization-Based Optimization

TL;DR

Abstract

Linear Reservoir: A Diagonalization-Based Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (6)