When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training

Alexander Morgan; Ummay Sumaya Khan; Lingjia Liu; Lizhong Zheng

When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training

Alexander Morgan, Ummay Sumaya Khan, Lingjia Liu, Lizhong Zheng

TL;DR

Analysis shows that pole learning renders the weight optimization problem highly non-convex, requiring significantly more training samples and iterations for gradient-based methods to converge to meaningful solutions, and numerical results demonstrate that fixed-pole networks achieve superior performance with lower training complexity, making them more suitable for online real-time tasks.

Abstract

Recurrent neural networks (RNNs) can be interpreted as discrete-time state-space models, where the state evolution corresponds to an infinite-impulse-response (IIR) filtering operation governed by both feedforward weights and recurrent poles. While, in principle, all parameters including pole locations can be optimized via backpropagation through time (BPTT), such joint learning incurs substantial computational overhead and is often impractical for applications with limited training data. Echo state networks (ESNs) mitigate this limitation by fixing the recurrent dynamics and training only a linear readout, enabling efficient and stable online adaptation. In this work, we analytically and empirically examine why learning recurrent poles does not provide tangible benefits in data-constrained, real-time learning scenarios. Our analysis shows that pole learning renders the weight optimization problem highly non-convex, requiring significantly more training samples and iterations for gradient-based methods to converge to meaningful solutions. Empirically, we observe that for complex-valued data, gradient descent frequently exhibits prolonged plateaus, and advanced optimizers offer limited improvement. In contrast, fixed-pole architectures induce stable and well-conditioned state representations even with limited training data. Numerical results demonstrate that fixed-pole networks achieve superior performance with lower training complexity, making them more suitable for online real-time tasks.

When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training

TL;DR

Abstract

Paper Structure (5 sections, 6 theorems, 32 equations, 3 figures)

This paper contains 5 sections, 6 theorems, 32 equations, 3 figures.

Introduction
RNN pole learning via BPTT
Fundamental Limits of Pole Learning
Empirical Results
Conclusion

Key Result

Lemma 3.1

Fix any $K$ pairwise distinct but otherwise unrestricted $\beta_1, \dots, \beta_K \in \mathbb{C}$, and choose any $N \geq K$. Let $s_k[n]$ be the length-$N$ sequence defined by $s_k[n] = \beta_k^n$ for $0 \leq n < N$. Then, $s_k$ are linearly independent, i.e., if $b_1, \dots, b_K \in \mathbb{C}$ su

Figures (3)

Figure 1: Surface of $f(x, y)$ when $c = 0.75$ and $d = 0.8$. Green and purple points represent $(c, d)$ and $(d, c)$, respectively. Note the "narrow-valley" behavior of the objective at the optimum points, reflective of a large condition number for the Hessian.
Figure 2: Training dynamics of a trainable RNN. Left: gradient norm decay. Middle: condition number of recurrent matrix $W_{\mathrm{rec}}$. Right: training loss.
Figure 3: Spectral density of eigenvalue magnitudes $|\lambda|$ over epochs.

Theorems & Definitions (12)

Lemma 3.1
proof
Theorem 3.2
proof
Lemma 3.3
proof
Corollary 3.4
proof
Lemma 3.5
proof
...and 2 more

When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training

TL;DR

Abstract

When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (12)