Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

Pengfei Sun; Jibin Wu; Malu Zhang; Paul Devos; Dick Botteldooren

Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

Pengfei Sun, Jibin Wu, Malu Zhang, Paul Devos, Dick Botteldooren

TL;DR

The proposed DMU demonstrates superior temporal modeling capabilities across a broad range of sequential modeling tasks, utilizing considerably fewer parameters than other state-of-the-art gated RNN models in applications such as speech recognition, radar gesture recognition, ECG waveform segmentation, and permuted sequential image classification.

Abstract

Recurrent Neural Networks (RNNs) are widely recognized for their proficiency in modeling temporal dependencies, making them highly prevalent in sequential data processing applications. Nevertheless, vanilla RNNs are confronted with the well-known issue of gradient vanishing and exploding, posing a significant challenge for learning and establishing long-range dependencies. Additionally, gated RNNs tend to be over-parameterized, resulting in poor computational efficiency and network generalization. To address these challenges, this paper proposes a novel Delayed Memory Unit (DMU). The DMU incorporates a delay line structure along with delay gates into vanilla RNN, thereby enhancing temporal interaction and facilitating temporal credit assignment. Specifically, the DMU is designed to directly distribute the input information to the optimal time instant in the future, rather than aggregating and redistributing it over time through intricate network dynamics. Our proposed DMU demonstrates superior temporal modeling capabilities across a broad range of sequential modeling tasks, utilizing considerably fewer parameters than other state-of-the-art gated RNN models in applications such as speech recognition, radar gesture recognition, ECG waveform segmentation, and permuted sequential image classification.

Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

TL;DR

Abstract

Paper Structure (19 sections, 8 equations, 7 figures, 3 tables)

This paper contains 19 sections, 8 equations, 7 figures, 3 tables.

INTRODUCTION
Method
Delayed Memory Unit
Temporal Credit Assignment within DMU
Model Complexity Analysis
Thresholding DMU and Dilated Delay
Experimental Results
Experimental Setups and Training Configuration
Speech Processing
Radar Gesture Recognition
ECG Waveform Segmentation
Permuted Sequential Image Classification
Discussions
Understand The Interplay Between Recurrent Memory and Delay Line
Effect of the Number of Delays and Delay Dilation Factors
...and 4 more sections

Figures (7)

Figure 1: Illustration of the proposed DMU. The hidden state passes through a delay line, along which the delay unit continuously applies a fixed delay of $\tau$ to the signal. At each point along the line, the corresponding delay cell gates the information from the state. The light blue area on the right side illustrates the time-unrolled DMU internal operation. This area corresponds to another light blue section at the bottom of the illustration. Similarly, the segment highlighted in light yellow within the figure represents the first state of the sliding memory window, $m_t^1$.
Figure 2: Comparison of the false reject rate and false alarm rate between the proposed DMU model, RNN, and LSTM models on the wake-word detection task.
Figure 3: Comparison of the learning curves of different models on the PS-MNIST dataset with a temporal duration of 784 time steps. This dataset has been specifically designed to test the model's ability to retain long-term memory between pixels that may be widely separated. All the models use $200$ hidden neurons, and DMU_$n$ denotes the DMU model with the number of delays of $n$.
Figure 4: Histograms of the learned recurrent weights on the SHD dataset. (a) Indrnn. (b) Indrnn+DMU (20 delays). (c) Indrnn+DMU (60 delays). (d) Indrnn+DMU (100 delays). The x-axis represents the value of the recurrent weights and the y-axis represents the frequency.
Figure 5: (a) Comparison of performance for two decoding methods in DMU: Last time step loss (Last) and readout integrator (All). The X-axis represents the number of delays $n$ in a delay line and the Y-axis indicates classification accuracy. "LSTM_Last" represents the LSTM coupled with the "Last" decoding method, while "LSTM_All" stands for LSTM coupled with the "All" decoding method. (b) The effect of the dilation factor $\tau$ on the classification accuracy. A fixed total delay of 40 has been used.
...and 2 more figures

Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

TL;DR

Abstract

Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

Authors

TL;DR

Abstract

Table of Contents

Figures (7)