Table of Contents
Fetching ...

When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training

Alexander Morgan, Ummay Sumaya Khan, Lingjia Liu, Lizhong Zheng

TL;DR

Analysis shows that pole learning renders the weight optimization problem highly non-convex, requiring significantly more training samples and iterations for gradient-based methods to converge to meaningful solutions, and numerical results demonstrate that fixed-pole networks achieve superior performance with lower training complexity, making them more suitable for online real-time tasks.

Abstract

Recurrent neural networks (RNNs) can be interpreted as discrete-time state-space models, where the state evolution corresponds to an infinite-impulse-response (IIR) filtering operation governed by both feedforward weights and recurrent poles. While, in principle, all parameters including pole locations can be optimized via backpropagation through time (BPTT), such joint learning incurs substantial computational overhead and is often impractical for applications with limited training data. Echo state networks (ESNs) mitigate this limitation by fixing the recurrent dynamics and training only a linear readout, enabling efficient and stable online adaptation. In this work, we analytically and empirically examine why learning recurrent poles does not provide tangible benefits in data-constrained, real-time learning scenarios. Our analysis shows that pole learning renders the weight optimization problem highly non-convex, requiring significantly more training samples and iterations for gradient-based methods to converge to meaningful solutions. Empirically, we observe that for complex-valued data, gradient descent frequently exhibits prolonged plateaus, and advanced optimizers offer limited improvement. In contrast, fixed-pole architectures induce stable and well-conditioned state representations even with limited training data. Numerical results demonstrate that fixed-pole networks achieve superior performance with lower training complexity, making them more suitable for online real-time tasks.

When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training

TL;DR

Analysis shows that pole learning renders the weight optimization problem highly non-convex, requiring significantly more training samples and iterations for gradient-based methods to converge to meaningful solutions, and numerical results demonstrate that fixed-pole networks achieve superior performance with lower training complexity, making them more suitable for online real-time tasks.

Abstract

Recurrent neural networks (RNNs) can be interpreted as discrete-time state-space models, where the state evolution corresponds to an infinite-impulse-response (IIR) filtering operation governed by both feedforward weights and recurrent poles. While, in principle, all parameters including pole locations can be optimized via backpropagation through time (BPTT), such joint learning incurs substantial computational overhead and is often impractical for applications with limited training data. Echo state networks (ESNs) mitigate this limitation by fixing the recurrent dynamics and training only a linear readout, enabling efficient and stable online adaptation. In this work, we analytically and empirically examine why learning recurrent poles does not provide tangible benefits in data-constrained, real-time learning scenarios. Our analysis shows that pole learning renders the weight optimization problem highly non-convex, requiring significantly more training samples and iterations for gradient-based methods to converge to meaningful solutions. Empirically, we observe that for complex-valued data, gradient descent frequently exhibits prolonged plateaus, and advanced optimizers offer limited improvement. In contrast, fixed-pole architectures induce stable and well-conditioned state representations even with limited training data. Numerical results demonstrate that fixed-pole networks achieve superior performance with lower training complexity, making them more suitable for online real-time tasks.
Paper Structure (5 sections, 6 theorems, 32 equations, 3 figures)

This paper contains 5 sections, 6 theorems, 32 equations, 3 figures.

Key Result

Lemma 3.1

Fix any $K$ pairwise distinct but otherwise unrestricted $\beta_1, \dots, \beta_K \in \mathbb{C}$, and choose any $N \geq K$. Let $s_k[n]$ be the length-$N$ sequence defined by $s_k[n] = \beta_k^n$ for $0 \leq n < N$. Then, $s_k$ are linearly independent, i.e., if $b_1, \dots, b_K \in \mathbb{C}$ su

Figures (3)

  • Figure 1: Surface of $f(x, y)$ when $c = 0.75$ and $d = 0.8$. Green and purple points represent $(c, d)$ and $(d, c)$, respectively. Note the "narrow-valley" behavior of the objective at the optimum points, reflective of a large condition number for the Hessian.
  • Figure 2: Training dynamics of a trainable RNN. Left: gradient norm decay. Middle: condition number of recurrent matrix $W_{\mathrm{rec}}$. Right: training loss.
  • Figure 3: Spectral density of eigenvalue magnitudes $|\lambda|$ over epochs.

Theorems & Definitions (12)

  • Lemma 3.1
  • proof
  • Theorem 3.2
  • proof
  • Lemma 3.3
  • proof
  • Corollary 3.4
  • proof
  • Lemma 3.5
  • proof
  • ...and 2 more