A Technical Note on the Architectural Effects on Maximum Dependency Lengths of Recurrent Neural Networks
Jonathan S. Kent, Michael M. Murray
TL;DR
The paper tackles how neural architecture influences the maximum dependency length in recurrent models (RNN, GRU, LSTM) by introducing a domain-agnostic synthetic task based on delayed reconstruction of a square wave with fixed window $d=5$ and variable delay $l$. It proposes a grow-terminate-search approach, trained with cross-entropy loss and the Adam optimizer, to map the learnable dependency length across 384 architectures (1–8 layers, 8–128 neurons per layer). A GRU with 6 layers and 120 neurons achieves a maximum learned delay of $l=50$, though results exhibit notable noise and backend artifacts that underscore measurement limitations. Overall, the work offers a practical framework for estimating architectural requirements to capture long-range dependencies and cautions about the variability inherent in real training pipelines, framing the contribution as a technical note on methodology and limitations.
Abstract
This work proposes a methodology for determining the maximum dependency length of a recurrent neural network (RNN), and then studies the effects of architectural changes, including the number and neuron count of layers, on the maximum dependency lengths of traditional RNN, gated recurrent unit (GRU), and long-short term memory (LSTM) models.
