Understanding LSTM -- a tutorial into Long Short-Term Memory Recurrent Neural Networks
Ralf C. Staudemeyer, Eric Rothstein Morris
TL;DR
<3-5 sentence high-level summary> This paper offers a thorough, tutorial-style synthesis of Long Short-Term Memory RNNs, tracing their development from early perceptrons to modern LSTM variants. It emphasizes the vanishing gradient problem in standard RNNs and presents LSTM's core mechanism—the constant error carousel and gated memory blocks—to enable learning over long time horizons. The authors unifyNotations and detail hybrid learning approaches (BPTT and RTRL), while surveying extensions (BLSTM, Grid LSTM, GRU) and diverse applications (speech, handwriting, translation, image tasks). The work highlights strengths in long-range memory and controlled information flow, while acknowledging limitations related to fixed topology and modular design, guiding future architectural innovations.
Abstract
Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) are one of the most powerful dynamic classifiers publicly known. The network itself and the related learning algorithms are reasonably well documented to get an idea how it works. This paper will shed more light into understanding how LSTM-RNNs evolved and why they work impressively well, focusing on the early, ground-breaking publications. We significantly improved documentation and fixed a number of errors and inconsistencies that accumulated in previous publications. To support understanding we as well revised and unified the notation used.
