Gated Recurrent Neural Networks with Weighted Time-Delay Feedback
N. Benjamin Erichson, Soon Hoe Lim, Michael W. Mahoney
TL;DR
τ-GRU addresses the challenge of modeling long-term dependencies in sequential data by deriving a gated recurrent unit from a continuous-time delay differential equation with weighted time-delay feedback. The resulting architecture discretizes to a GRU-like update that includes a delayed term weighted by a gate and a per-component weight, providing gradient-buffering effects to mitigate vanishing gradients. The authors prove the continuous-time model has a unique solution and demonstrate, through extensive experiments on diverse tasks (Adding, HAR-2, IMDB, sequential image classification, climate dynamics, and frequency classification), that τ-GRU converges faster and often generalizes better than state-of-the-art RNNs and some SSMs in the small-data regime. Limitations include the single-delay assumption; future work could explore multiple/distributed delays and noise-injected variants.
Abstract
In this paper, we present a novel approach to modeling long-term dependencies in sequential data by introducing a gated recurrent unit (GRU) with a weighted time-delay feedback mechanism. Our proposed model, named $τ$-GRU, is a discretized version of a continuous-time formulation of a recurrent unit, where the dynamics are governed by delay differential equations (DDEs). We prove the existence and uniqueness of solutions for the continuous-time model and show that the proposed feedback mechanism can significantly improve the modeling of long-term dependencies. Our empirical results indicate that $τ$-GRU outperforms state-of-the-art recurrent units and gated recurrent architectures on a range of tasks, achieving faster convergence and better generalization.
