On the Computational Power of RNNs
Samuel A. Korsky, Robert C. Berwick
TL;DR
This work analyzes the computational power of recurrent neural networks, focusing on simple RNNs and GRUs under finite, arbitrary, and infinite precision. It proves constructively that finite-precision simple RNNs with one hidden layer are power-equivalent to $-$deterministic finite automata, while allowing arbitrary precision enables stack-like behavior that can recognize Dyck languages and, via $L = D_n \cap R$, arbitrary context-free languages with manageable hidden sizes. It extends the DFA-equivalence to finite-precision GRUs (Theorem $2.1$) and shows GRUs can realize $D_n$ as the $(0, (2n+1)^{-1})$-language, with a broader result proving context-free languages $L = D_n \cap R$ are achievable by GRUs of size $8 + 2nr$ (Theorems $2.12$ and $2.13$). These constructive proofs illuminate how modern RNN variants process hierarchical structure and offer a theoretical bridge between neural architectures and classical automata theory.
Abstract
Recent neural network architectures such as the basic recurrent neural network (RNN) and Gated Recurrent Unit (GRU) have gained prominence as end-to-end learning architectures for natural language processing tasks. But what is the computational power of such systems? We prove that finite precision RNNs with one hidden layer and ReLU activation and finite precision GRUs are exactly as computationally powerful as deterministic finite automata. Allowing arbitrary precision, we prove that RNNs with one hidden layer and ReLU activation are at least as computationally powerful as pushdown automata. If we also allow infinite precision, infinite edge weights, and nonlinear output activation functions, we prove that GRUs are at least as computationally powerful as pushdown automata. All results are shown constructively.
