Sequential Neural Networks as Automata
William Merrill
TL;DR
The paper introduces the notion of asymptotic language acceptance and state complexity to analyze what types of formal languages different neural sequence architectures can recognize. By formalizing real-time, bounded-precision constraints, it derives automata-theoretic characterizations for SRNs, GRUs, LSTMs, attention, and CNNs, and places them in a hierarchy between regular and more expressive language classes. Empirical experiments on counting, counting with noise, and string reversal test the predictions, revealing both alignment and gaps and highlighting the stabilizing effect of noise regularization. The work provides a principled framework linking neural computation to formal language theory and grammar, with implications for interpretability and architectural design.
Abstract
This work attempts to explain the types of computation that neural networks can perform by relating them to automata. We first define what it means for a real-time network with bounded precision to accept a language. A measure of network memory follows from this definition. We then characterize the classes of languages acceptable by various recurrent networks, attention, and convolutional networks. We find that LSTMs function like counter machines and relate convolutional networks to the subregular hierarchy. Overall, this work attempts to increase our understanding and ability to interpret neural networks through the lens of theory. These theoretical insights help explain neural computation, as well as the relationship between neural networks and natural language grammar.
