Table of Contents
Fetching ...

Deep Residual Echo State Networks: exploring residual orthogonal connections in untrained Recurrent Neural Networks

Matteo Pinna, Andrea Ceni, Claudio Gallicchio

TL;DR

DeepResESN tackles the challenge of long-term temporal modeling in untrained RNNs by stacking multiple reservoirs with temporal residual connections. The approach unifies DeepESN and ResESN concepts, introducing per-layer orthogonal residual mappings and leaky-like scaling to preserve signal propagation while enabling deep temporal hierarchies. The authors derive ESP-compatible stability and contractivity conditions, analyze the Jacobian spectrum, and validate the method across memory, forecasting, and classification tasks, reporting substantial performance gains over shallow and some deep RC baselines. The work advances practical untrained RNN design for time-series tasks and provides a rigorous theoretical framework for stable, expressive deep reservoir dynamics with potential impact on fast, robust temporal modeling systems.

Abstract

Echo State Networks (ESNs) are a particular type of untrained Recurrent Neural Networks (RNNs) within the Reservoir Computing (RC) framework, popular for their fast and efficient learning. However, traditional ESNs often struggle with long-term information processing. In this paper, we introduce a novel class of deep untrained RNNs based on temporal residual connections, called Deep Residual Echo State Networks (DeepResESNs). We show that leveraging a hierarchy of untrained residual recurrent layers significantly boosts memory capacity and long-term temporal modeling. For the temporal residual connections, we consider different orthogonal configurations, including randomly generated and fixed-structure configurations, and we study their effect on network dynamics. A thorough mathematical analysis outlines necessary and sufficient conditions to ensure stable dynamics within DeepResESN. Our experiments on a variety of time series tasks showcase the advantages of the proposed approach over traditional shallow and deep RC.

Deep Residual Echo State Networks: exploring residual orthogonal connections in untrained Recurrent Neural Networks

TL;DR

DeepResESN tackles the challenge of long-term temporal modeling in untrained RNNs by stacking multiple reservoirs with temporal residual connections. The approach unifies DeepESN and ResESN concepts, introducing per-layer orthogonal residual mappings and leaky-like scaling to preserve signal propagation while enabling deep temporal hierarchies. The authors derive ESP-compatible stability and contractivity conditions, analyze the Jacobian spectrum, and validate the method across memory, forecasting, and classification tasks, reporting substantial performance gains over shallow and some deep RC baselines. The work advances practical untrained RNN design for time-series tasks and provides a rigorous theoretical framework for stable, expressive deep reservoir dynamics with potential impact on fast, robust temporal modeling systems.

Abstract

Echo State Networks (ESNs) are a particular type of untrained Recurrent Neural Networks (RNNs) within the Reservoir Computing (RC) framework, popular for their fast and efficient learning. However, traditional ESNs often struggle with long-term information processing. In this paper, we introduce a novel class of deep untrained RNNs based on temporal residual connections, called Deep Residual Echo State Networks (DeepResESNs). We show that leveraging a hierarchy of untrained residual recurrent layers significantly boosts memory capacity and long-term temporal modeling. For the temporal residual connections, we consider different orthogonal configurations, including randomly generated and fixed-structure configurations, and we study their effect on network dynamics. A thorough mathematical analysis outlines necessary and sufficient conditions to ensure stable dynamics within DeepResESN. Our experiments on a variety of time series tasks showcase the advantages of the proposed approach over traditional shallow and deep RC.

Paper Structure

This paper contains 17 sections, 3 theorems, 31 equations, 5 figures, 4 tables.

Key Result

Theorem 1

Assume a DeepResESN whose inter-layer and global dynamics are defined in eq:deepresesnFi and eq:deepresesnF, respectively. Furthermore, assume zero input and zero initial state. The global spectral radius of the system is expressed as: where $\mathbf{0}_x \in \mathbb{R}^{N_x}$ and $\mathbf{0} \in \mathbb{R}^{N_L N_h}$ are the zero input and the zero state vectors, respectively. Then, a necessary

Figures (5)

  • Figure 1: Architectural organization of the proposed DeepResESN. (a) Structure of a generic $l$-th reservoir layer in a DeepResESN. The reservoir structure (shown in blue) consists of an input weight matrix $\mathbf{W}_x^{(l)}$, a recurrent weight matrix $\mathbf{W}_h^{(l)}$, and a non-linear activation function $\phi$. The temporal residual connection (shown in purple) is modulated by an orthogonal matrix $\mathbf{O}$. The temporal residual and non-linear paths are scaled by positive coefficients $\alpha^{(l)}$ and $\beta^{(l)}$, respectively. (b) Complete illustration of a DeepResESN architecture with $N_{L}$ reservoir layers. The first layer acts as a residual reservoir in a traditional shallow architecture and is fed the external input $\mathbf{x}^{(1)}$. Subsequent layers receive as input the output of the previous reservoir, $\mathbf{h}^{(l-1)}$. The readout may be fed either the final layer states or the concatenation of states from all layers. See Section \ref{['sec:deepresesn']} for details.
  • Figure 2: Structure of the three orthogonal matrices ($10 \times 10$) used in the temporal residual connections.
  • Figure 3: Spectral frequencies of (a) DeepResESN$_\mathrm{R}$, (b) DeepResESN$_\mathrm{C}$, and (c) DeepResESN$_\mathrm{I}$, in progressively deeper layers (columns). In each layer, and for all configurations, we consider $N_{h} = 100$ recurrent neurons, $\rho = 1$, $\alpha = 0.9$, $\beta = 0.1$, $\omega_x = 1$ and $\omega_b = 0$. Results are averaged over $10$ trials. Magnitudes have been normalized to ease visualization. Red arrows highlight the trend in spectral magnitudes.
  • Figure 4: Eigenvalues of the Jacobian of a DeepResESN$_{\mathrm{R}}$ for spectral radii (a) $\rho = 0.5$, (b) $\rho = 1$, and (c) $\rho = 2$, for progressively deeper layers (columns). In each layer, we consider $N_{h} = 100$ recurrent neurons, $\rho$ as specified in each subplot, $\alpha = 0.5$, $\beta = 1$, $\omega_x = 1$, and $\omega_b = 0$. Model dynamics are driven by a random input vector and a random hidden state, both uniformly distributed in $(-1, 1)$. In orange the unitary circle.
  • Figure 5: (left) Performance gain of each model class relative to LeakyESN, broken down by task and averaged across all task-specific datasets. For ResESNs and DeepResESNs, we consider the best-performing configuration for each dataset. (right) Critical difference plot computed via a Wilcoxon test demvsar2006statistical, summarizing the average rank (lower is better) of each model class across all tasks and datasets. Cliques, represented as solid lines, connect models with no statistically significant difference in performance.

Theorems & Definitions (3)

  • Theorem 1: Necessary condition for the ESP of a DeepResESN
  • Lemma 1: Sufficient condition for contractivity of layer's dynamics
  • Theorem 2: Sufficient condition for the ESP of a DeepResESN