Table of Contents
Fetching ...

Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies

T. Konstantin Rusch, Siddhartha Mishra

TL;DR

The paper introduces coRNN, a gradient-stable recurrent architecture grounded in networks of coupled nonlinear oscillators and discretized via an IMEX scheme of a second-order ODE. By design, hidden states and their gradients remain bounded, mitigating exploding and vanishing gradient problems while preserving expressivity across long sequences. The authors provide rigorous energy and gradient bounds and demonstrate strong empirical performance on long-term dependency tasks (adding problems, sMNIST/psMNIST, noise-padded CIFAR-10, HAR-2, IMDB) with competitive efficiency. They also discuss robustness to hyperparameters, validation of theoretical assumptions during training, and potential applications to oscillatory and real-valued sequential data, while noting limitations for chaotic dynamics and outlining avenues for future work.

Abstract

Circuits of biological neurons, such as in the functional parts of the brain can be modeled as networks of coupled oscillators. Inspired by the ability of these systems to express a rich set of outputs while keeping (gradients of) state variables bounded, we propose a novel architecture for recurrent neural networks. Our proposed RNN is based on a time-discretization of a system of second-order ordinary differential equations, modeling networks of controlled nonlinear oscillators. We prove precise bounds on the gradients of the hidden states, leading to the mitigation of the exploding and vanishing gradient problem for this RNN. Experiments show that the proposed RNN is comparable in performance to the state of the art on a variety of benchmarks, demonstrating the potential of this architecture to provide stable and accurate RNNs for processing complex sequential data.

Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies

TL;DR

The paper introduces coRNN, a gradient-stable recurrent architecture grounded in networks of coupled nonlinear oscillators and discretized via an IMEX scheme of a second-order ODE. By design, hidden states and their gradients remain bounded, mitigating exploding and vanishing gradient problems while preserving expressivity across long sequences. The authors provide rigorous energy and gradient bounds and demonstrate strong empirical performance on long-term dependency tasks (adding problems, sMNIST/psMNIST, noise-padded CIFAR-10, HAR-2, IMDB) with competitive efficiency. They also discuss robustness to hyperparameters, validation of theoretical assumptions during training, and potential applications to oscillatory and real-valued sequential data, while noting limitations for chaotic dynamics and outlining avenues for future work.

Abstract

Circuits of biological neurons, such as in the functional parts of the brain can be modeled as networks of coupled oscillators. Inspired by the ability of these systems to express a rich set of outputs while keeping (gradients of) state variables bounded, we propose a novel architecture for recurrent neural networks. Our proposed RNN is based on a time-discretization of a system of second-order ordinary differential equations, modeling networks of controlled nonlinear oscillators. We prove precise bounds on the gradients of the hidden states, leading to the mitigation of the exploding and vanishing gradient problem for this RNN. Experiments show that the proposed RNN is comparable in performance to the state of the art on a variety of benchmarks, demonstrating the potential of this architecture to provide stable and accurate RNNs for processing complex sequential data.

Paper Structure

This paper contains 31 sections, 10 theorems, 81 equations, 6 figures, 8 tables.

Key Result

Proposition 3.1

Let ${\bf y}_n,{\bf z}_n$ be the hidden states of the RNN eq:brnn1 for $1\leq n \leq N$, then the hidden states satisfy the following (energy) bounds:

Figures (6)

  • Figure 1: Results of the adding problem for coRNN, expRNN, FastRNN, anti.sym. RNN and tanh RNN based on three different sequence lengths $T$, i.e. $T=500$, $T=2000$ and $T=5000$.
  • Figure 2: Performance on psMNIST for different models, all with 128 hidden units and the same fixed random permutation.
  • Figure 3: Weight assumptions \ref{['eq:assm']}, with $r=\frac{1}{2}$, evaluated during training for all LTD experiments (mean and standard deviation of 10 different runs for each task).
  • Figure 4: Ablation study on the hyperparameters $\epsilon,\gamma$ in \ref{['eq:brnn']} using the noise padded CIFAR-10 experiment.
  • Figure 5: Exemplary $(x_1,x_2)$-trajectories of the Lorenz 96 system \ref{['eq:lorenz']} for different forces $F$.
  • ...and 1 more figures

Theorems & Definitions (10)

  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Proposition D.1
  • Proposition D.2
  • Proposition D.3
  • Proposition E.1
  • Proposition F.1
  • Proposition F.2
  • Proposition F.3