Gated Recurrent Neural Networks with Weighted Time-Delay Feedback

N. Benjamin Erichson; Soon Hoe Lim; Michael W. Mahoney

Gated Recurrent Neural Networks with Weighted Time-Delay Feedback

N. Benjamin Erichson, Soon Hoe Lim, Michael W. Mahoney

TL;DR

τ-GRU addresses the challenge of modeling long-term dependencies in sequential data by deriving a gated recurrent unit from a continuous-time delay differential equation with weighted time-delay feedback. The resulting architecture discretizes to a GRU-like update that includes a delayed term weighted by a gate and a per-component weight, providing gradient-buffering effects to mitigate vanishing gradients. The authors prove the continuous-time model has a unique solution and demonstrate, through extensive experiments on diverse tasks (Adding, HAR-2, IMDB, sequential image classification, climate dynamics, and frequency classification), that τ-GRU converges faster and often generalizes better than state-of-the-art RNNs and some SSMs in the small-data regime. Limitations include the single-delay assumption; future work could explore multiple/distributed delays and noise-injected variants.

Abstract

In this paper, we present a novel approach to modeling long-term dependencies in sequential data by introducing a gated recurrent unit (GRU) with a weighted time-delay feedback mechanism. Our proposed model, named $τ$-GRU, is a discretized version of a continuous-time formulation of a recurrent unit, where the dynamics are governed by delay differential equations (DDEs). We prove the existence and uniqueness of solutions for the continuous-time model and show that the proposed feedback mechanism can significantly improve the modeling of long-term dependencies. Our empirical results indicate that $τ$-GRU outperforms state-of-the-art recurrent units and gated recurrent architectures on a range of tasks, achieving faster convergence and better generalization.

Gated Recurrent Neural Networks with Weighted Time-Delay Feedback

TL;DR

Abstract

-GRU, is a discretized version of a continuous-time formulation of a recurrent unit, where the dynamics are governed by delay differential equations (DDEs). We prove the existence and uniqueness of solutions for the continuous-time model and show that the proposed feedback mechanism can significantly improve the modeling of long-term dependencies. Our empirical results indicate that

-GRU outperforms state-of-the-art recurrent units and gated recurrent architectures on a range of tasks, achieving faster convergence and better generalization.

Paper Structure (31 sections, 8 theorems, 72 equations, 10 figures, 12 tables)

This paper contains 31 sections, 8 theorems, 72 equations, 10 figures, 12 tables.

INTRODUCTION
RELATED WORK
METHOD
Delay Differential Equations
Continuous-Time tau-GRUs
Discrete-Time tau-GRUs
Discrete-Time tau-GRUs with a Weighted Time-Delay Feedback Architecture
THEORY
Existence and Uniqueness of Solution for Continuous-Time tau-GRUs
The Delay Mechanism in tau-GRUs Can Help Improve Long-Term Dependencies
EXPERIMENTAL RESULTS
The Adding Task.
Human Activity Recognition: HAR-2.
Sentiment Analysis: IMDB.
Sequential Image Classification.
...and 16 more sections

Key Result

Theorem 1

Let $t_0 \in \mathbb{R}$ and $\phi \in C$ be given. There exists a unique solution $h(t) = h(t, \phi)$ of Eq. eq_gendde, defined on $[t_0 - \tau, t_0 + A]$ for any $A > 0$. In particular, the solution exists for all $t \geq t_0$, and for all $t \geq t_0$, where $K = 1 + \|W_1\| + \|W_2\| + \|W_4\|/4$.

Figures (10)

Figure 1: Test accuracy for nCIFAR chang2018antisymmetricrnn versus Google-12 warden2018speech. nCIFAR requires a recurrent unit with long-term dependency capabilities, while Google-12 requires a highly expressive unit. Our $\tau$-GRU is able to improve performance on both tasks, relative to existing state-of-the-art alternatives, including LEM rusch2022long.
Figure 2: Results for the adding task. We show the one standard deviation bands for LEM and our $\tau$-GRU. On average, $\tau$-GRU converges faster, and obtains a lower MSE on the adding task.
Figure 3: Test accuracy for psMNIST.
Figure 4: Sensitivity analysis of $\tau$-GRU on psMNIST. The green envelope represent $\pm 1$ s.d. around the mean.
Figure 5: Hidden state dynamics of the DDE based RNNs with $\tau=0.5$ and $\tau=1$, and the ODE based RNN ($\tau=0$). All RNNs are driven by the same cosine input signal.
...and 5 more figures

Theorems & Definitions (13)

Theorem 1: Existence and uniqueness of solution for continuous-time $\tau$-GRU
Proposition 1
Theorem 2: Adapted from Theorem 3.7 in smith2011introduction
Theorem 3: Existence and uniqueness of solution for continuous-time $\tau$-GRU
proof
Proposition 2
proof
Lemma 1
proof
Proposition 3
...and 3 more

Gated Recurrent Neural Networks with Weighted Time-Delay Feedback

TL;DR

Abstract

Gated Recurrent Neural Networks with Weighted Time-Delay Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (13)