Table of Contents
Fetching ...

On the Practical Computational Power of Finite Precision RNNs for Language Recognition

Gail Weiss, Yoav Goldberg, Eran Yahav

TL;DR

This work analyzes the practical computational power of finite-precision, input-bound RNNs across several architectures (SRNN, IRNN, GRU, LSTM). By modeling these networks as simplified k-counter machines (SKCMs), it shows that LSTM and ReLU-based RNNs can implement unbounded counting, while SRNN and GRU cannot, under realistic training and precision constraints. Empirical results demonstrate that LSTMs learn counting behavior and generalize beyond training counts, outperforming GRUs on counting tasks like $a^nb^n$ and $a^nb^nc^n$. The findings have practical implications for architecture choice in sequence modeling, highlighting that counting-capable variants (LSTM, IRNN) can represent more powerful real-time computations than their squashing-activation or gated-counter peers under finite precision.

Abstract

While Recurrent Neural Networks (RNNs) are famously known to be Turing complete, this relies on infinite precision in the states and unbounded computation time. We consider the case of RNNs with finite precision whose computation time is linear in the input length. Under these limitations, we show that different RNN variants have different computational power. In particular, we show that the LSTM and the Elman-RNN with ReLU activation are strictly stronger than the RNN with a squashing activation and the GRU. This is achieved because LSTMs and ReLU-RNNs can easily implement counting behavior. We show empirically that the LSTM does indeed learn to effectively use the counting mechanism.

On the Practical Computational Power of Finite Precision RNNs for Language Recognition

TL;DR

This work analyzes the practical computational power of finite-precision, input-bound RNNs across several architectures (SRNN, IRNN, GRU, LSTM). By modeling these networks as simplified k-counter machines (SKCMs), it shows that LSTM and ReLU-based RNNs can implement unbounded counting, while SRNN and GRU cannot, under realistic training and precision constraints. Empirical results demonstrate that LSTMs learn counting behavior and generalize beyond training counts, outperforming GRUs on counting tasks like and . The findings have practical implications for architecture choice in sequence modeling, highlighting that counting-capable variants (LSTM, IRNN) can represent more powerful real-time computations than their squashing-activation or gated-counter peers under finite precision.

Abstract

While Recurrent Neural Networks (RNNs) are famously known to be Turing complete, this relies on infinite precision in the states and unbounded computation time. We consider the case of RNNs with finite precision whose computation time is linear in the input length. Under these limitations, we show that different RNN variants have different computational power. In particular, we show that the LSTM and the Elman-RNN with ReLU activation are strictly stronger than the RNN with a squashing activation and the GRU. This is achieved because LSTMs and ReLU-RNNs can easily implement counting behavior. We show empirically that the LSTM does indeed learn to effectively use the counting mechanism.

Paper Structure

This paper contains 26 sections, 17 equations, 1 figure.

Figures (1)

  • Figure 1: Activations --- c for LSTM and h for GRU --- for networks trained on $a^nb^n$ and $a^nb^nc^n$. The LSTM has clearly learned to use an explicit counting mechanism, in contrast with the GRU.