Table of Contents
Fetching ...

A Technical Note on the Architectural Effects on Maximum Dependency Lengths of Recurrent Neural Networks

Jonathan S. Kent, Michael M. Murray

TL;DR

The paper tackles how neural architecture influences the maximum dependency length in recurrent models (RNN, GRU, LSTM) by introducing a domain-agnostic synthetic task based on delayed reconstruction of a square wave with fixed window $d=5$ and variable delay $l$. It proposes a grow-terminate-search approach, trained with cross-entropy loss and the Adam optimizer, to map the learnable dependency length across 384 architectures (1–8 layers, 8–128 neurons per layer). A GRU with 6 layers and 120 neurons achieves a maximum learned delay of $l=50$, though results exhibit notable noise and backend artifacts that underscore measurement limitations. Overall, the work offers a practical framework for estimating architectural requirements to capture long-range dependencies and cautions about the variability inherent in real training pipelines, framing the contribution as a technical note on methodology and limitations.

Abstract

This work proposes a methodology for determining the maximum dependency length of a recurrent neural network (RNN), and then studies the effects of architectural changes, including the number and neuron count of layers, on the maximum dependency lengths of traditional RNN, gated recurrent unit (GRU), and long-short term memory (LSTM) models.

A Technical Note on the Architectural Effects on Maximum Dependency Lengths of Recurrent Neural Networks

TL;DR

The paper tackles how neural architecture influences the maximum dependency length in recurrent models (RNN, GRU, LSTM) by introducing a domain-agnostic synthetic task based on delayed reconstruction of a square wave with fixed window and variable delay . It proposes a grow-terminate-search approach, trained with cross-entropy loss and the Adam optimizer, to map the learnable dependency length across 384 architectures (1–8 layers, 8–128 neurons per layer). A GRU with 6 layers and 120 neurons achieves a maximum learned delay of , though results exhibit notable noise and backend artifacts that underscore measurement limitations. Overall, the work offers a practical framework for estimating architectural requirements to capture long-range dependencies and cautions about the variability inherent in real training pipelines, framing the contribution as a technical note on methodology and limitations.

Abstract

This work proposes a methodology for determining the maximum dependency length of a recurrent neural network (RNN), and then studies the effects of architectural changes, including the number and neuron count of layers, on the maximum dependency lengths of traditional RNN, gated recurrent unit (GRU), and long-short term memory (LSTM) models.
Paper Structure (5 sections, 4 figures, 1 table)

This paper contains 5 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Graphs of the minimum value over five runs for each model.
  • Figure 2: Graphs of the median value over five runs for each model.
  • Figure 3: Graphs of the mean value over five runs for each model.
  • Figure 4: Graphs of the maximum value over five runs for each model.