On the Practical Computational Power of Finite Precision RNNs for Language Recognition
Gail Weiss, Yoav Goldberg, Eran Yahav
TL;DR
This work analyzes the practical computational power of finite-precision, input-bound RNNs across several architectures (SRNN, IRNN, GRU, LSTM). By modeling these networks as simplified k-counter machines (SKCMs), it shows that LSTM and ReLU-based RNNs can implement unbounded counting, while SRNN and GRU cannot, under realistic training and precision constraints. Empirical results demonstrate that LSTMs learn counting behavior and generalize beyond training counts, outperforming GRUs on counting tasks like $a^nb^n$ and $a^nb^nc^n$. The findings have practical implications for architecture choice in sequence modeling, highlighting that counting-capable variants (LSTM, IRNN) can represent more powerful real-time computations than their squashing-activation or gated-counter peers under finite precision.
Abstract
While Recurrent Neural Networks (RNNs) are famously known to be Turing complete, this relies on infinite precision in the states and unbounded computation time. We consider the case of RNNs with finite precision whose computation time is linear in the input length. Under these limitations, we show that different RNN variants have different computational power. In particular, we show that the LSTM and the Elman-RNN with ReLU activation are strictly stronger than the RNN with a squashing activation and the GRU. This is achieved because LSTMs and ReLU-RNNs can easily implement counting behavior. We show empirically that the LSTM does indeed learn to effectively use the counting mechanism.
