Gradient-free training of recurrent neural networks
Erik Lien Bolager, Ana Cukarska, Iryna Burak, Zahra Monfared, Felix Dietrich
TL;DR
This work presents a gradient-free approach to training recurrent neural networks by randomly sampling hidden-layer parameters and solving a linear regression for the output mapping. By embedding the RNN dynamics into a Koopman operator–driven, high-dimensional linear state-space model and employing extended dynamic mode decomposition, the method yields a convergent, data-informed training procedure that avoids backpropagation through time. Empirical results across simple ODEs, chaotic systems, control tasks, and weather data show substantial reductions in training time while maintaining or improving forecasting accuracy relative to gradient-based methods. The framework offers interpretability via the data-driven sampling and a spectral perspective on stability, with theoretical convergence guarantees for the uncontrolled setting and practical extensions to controlled scenarios and real-world data.
Abstract
Recurrent neural networks are a successful neural architecture for many time-dependent problems, including time series analysis, forecasting, and modeling of dynamical systems. Training such networks with backpropagation through time is a notoriously difficult problem because their loss gradients tend to explode or vanish. In this contribution, we introduce a computational approach to construct all weights and biases of a recurrent neural network without using gradient-based methods. The approach is based on a combination of random feature networks and Koopman operator theory for dynamical systems. The hidden parameters of a single recurrent block are sampled at random, while the outer weights are constructed using extended dynamic mode decomposition. This approach alleviates all problems with backpropagation commonly related to recurrent networks. The connection to Koopman operator theory also allows us to start using results in this area to analyze recurrent neural networks. In computational experiments on time series, forecasting for chaotic dynamical systems, and control problems, as well as on weather data, we observe that the training time and forecasting accuracy of the recurrent neural networks we construct are improved when compared to commonly used gradient-based methods.
