Table of Contents
Fetching ...

Gradient-free training of recurrent neural networks

Erik Lien Bolager, Ana Cukarska, Iryna Burak, Zahra Monfared, Felix Dietrich

TL;DR

This work presents a gradient-free approach to training recurrent neural networks by randomly sampling hidden-layer parameters and solving a linear regression for the output mapping. By embedding the RNN dynamics into a Koopman operator–driven, high-dimensional linear state-space model and employing extended dynamic mode decomposition, the method yields a convergent, data-informed training procedure that avoids backpropagation through time. Empirical results across simple ODEs, chaotic systems, control tasks, and weather data show substantial reductions in training time while maintaining or improving forecasting accuracy relative to gradient-based methods. The framework offers interpretability via the data-driven sampling and a spectral perspective on stability, with theoretical convergence guarantees for the uncontrolled setting and practical extensions to controlled scenarios and real-world data.

Abstract

Recurrent neural networks are a successful neural architecture for many time-dependent problems, including time series analysis, forecasting, and modeling of dynamical systems. Training such networks with backpropagation through time is a notoriously difficult problem because their loss gradients tend to explode or vanish. In this contribution, we introduce a computational approach to construct all weights and biases of a recurrent neural network without using gradient-based methods. The approach is based on a combination of random feature networks and Koopman operator theory for dynamical systems. The hidden parameters of a single recurrent block are sampled at random, while the outer weights are constructed using extended dynamic mode decomposition. This approach alleviates all problems with backpropagation commonly related to recurrent networks. The connection to Koopman operator theory also allows us to start using results in this area to analyze recurrent neural networks. In computational experiments on time series, forecasting for chaotic dynamical systems, and control problems, as well as on weather data, we observe that the training time and forecasting accuracy of the recurrent neural networks we construct are improved when compared to commonly used gradient-based methods.

Gradient-free training of recurrent neural networks

TL;DR

This work presents a gradient-free approach to training recurrent neural networks by randomly sampling hidden-layer parameters and solving a linear regression for the output mapping. By embedding the RNN dynamics into a Koopman operator–driven, high-dimensional linear state-space model and employing extended dynamic mode decomposition, the method yields a convergent, data-informed training procedure that avoids backpropagation through time. Empirical results across simple ODEs, chaotic systems, control tasks, and weather data show substantial reductions in training time while maintaining or improving forecasting accuracy relative to gradient-based methods. The framework offers interpretability via the data-driven sampling and a spectral perspective on stability, with theoretical convergence guarantees for the uncontrolled setting and practical extensions to controlled scenarios and real-world data.

Abstract

Recurrent neural networks are a successful neural architecture for many time-dependent problems, including time series analysis, forecasting, and modeling of dynamical systems. Training such networks with backpropagation through time is a notoriously difficult problem because their loss gradients tend to explode or vanish. In this contribution, we introduce a computational approach to construct all weights and biases of a recurrent neural network without using gradient-based methods. The approach is based on a combination of random feature networks and Koopman operator theory for dynamical systems. The hidden parameters of a single recurrent block are sampled at random, while the outer weights are constructed using extended dynamic mode decomposition. This approach alleviates all problems with backpropagation commonly related to recurrent networks. The connection to Koopman operator theory also allows us to start using results in this area to analyze recurrent neural networks. In computational experiments on time series, forecasting for chaotic dynamical systems, and control problems, as well as on weather data, we observe that the training time and forecasting accuracy of the recurrent neural networks we construct are improved when compared to commonly used gradient-based methods.

Paper Structure

This paper contains 51 sections, 5 theorems, 72 equations, 17 figures, 8 tables, 4 algorithms.

Key Result

Theorem 1

Let $f \in L^2_K$, $H, H'$ be the dataset with $N$ data points used in eq:koopman_approx_main, and assum:main_paper holds. For any $\epsilon > 0$ and $T\in \mathbb{N}$, there exist an $M\in\mathbb{N}$ and hidden layers $\mathcal{F}_M$ and matrices $C$ such that for all $t \in [1,2,\dots,T]$.

Figures (17)

  • Figure 1: Illustration of the components of one recurrent block we construct in the paper. The state ${\bm{z}}_{t-1}$ enters on the left, and is processed through matrix $C$ and the neural network ${\mathcal{F}}=\sigma(W\cdot +{\bm{b}})$. We then advance in time to ${\bm{z}}_t$, using the Koopman matrix $K$ and the processed control inputs.
  • Figure 2: Comparison of true and predicted trajectories fror the Van der Pol experiments are shown for a test trajectory. Left: state space representation. Right: the top two rows show the full state system's first and second coordinate from \ref{['sec:simple_ODE']}, and the bottom most row shows the partially observed system from \ref{['sec:time_delay']}.
  • Figure 3: The results from the Lorenz experiment are shown for a test trajectory. Left: state space representation of true and predicted trajectories. Right: trajectories obtained from the Lorenz model described in \ref{['sec:chaotic systems']}.
  • Figure 4: Trajectories from the Rössler experiment are shown for a test trajectory. Left: state space representation of true and predicted trajectories. Right: trajectories obtained from the Rössler model described in \ref{['sec:chaotic systems']}.
  • Figure 5: Controlled (i.e. forced) Van der Pol experiment (\ref{['sec:control inputs']}) for initial condition ${\bm{h}}_0= [-1.5, -1]^{\mkern-1.5mu\mathsf{T}}$. Left: state space representation of controlled and uncontrolled trajectories. Right: $L^2$ norm of the controlled trajectory for five different runs and the $L^2$ norm of the target state.
  • ...and 12 more figures

Theorems & Definitions (14)

  • Definition 1
  • Remark 1
  • Theorem 1
  • Definition 2
  • Remark 2
  • Lemma 1
  • proof
  • Remark 3
  • Lemma 2
  • proof
  • ...and 4 more