Recurrent Dropout without Memory Loss
Stanislau Semeniuta, Aliaksei Severyn, Erhardt Barth
TL;DR
Recurrent Dropout without Memory Loss addresses regularization for gated RNNs by dropping the recurrent update vectors rather than hidden states, mitigating memory-loss issues. The authors propose per-step mask sampling and demonstrate that dropping g_t preserves memory and improves generalization when combined with forward dropout. They empirically validate on synthetic tasks and NLP benchmarks including language modeling, NER, and sentiment classification, showing consistent gains over prior recurrent dropout schemes. The work provides practical guidance on where to apply dropout in LSTMs/GRUs and sampling strategy, advancing robust RNN regularization in NLP.
Abstract
This paper presents a novel approach to recurrent neural network (RNN) regularization. Differently from the widely adopted dropout method, which is applied to \textit{forward} connections of feed-forward architectures or RNNs, we propose to drop neurons directly in \textit{recurrent} connections in a way that does not cause loss of long-term memory. Our approach is as easy to implement and apply as the regular feed-forward dropout and we demonstrate its effectiveness for Long Short-Term Memory network, the most popular type of RNN cells. Our experiments on NLP benchmarks show consistent improvements even when combined with conventional feed-forward dropout.
