Table of Contents
Fetching ...

Sequential Neural Networks as Automata

William Merrill

TL;DR

The paper introduces the notion of asymptotic language acceptance and state complexity to analyze what types of formal languages different neural sequence architectures can recognize. By formalizing real-time, bounded-precision constraints, it derives automata-theoretic characterizations for SRNs, GRUs, LSTMs, attention, and CNNs, and places them in a hierarchy between regular and more expressive language classes. Empirical experiments on counting, counting with noise, and string reversal test the predictions, revealing both alignment and gaps and highlighting the stabilizing effect of noise regularization. The work provides a principled framework linking neural computation to formal language theory and grammar, with implications for interpretability and architectural design.

Abstract

This work attempts to explain the types of computation that neural networks can perform by relating them to automata. We first define what it means for a real-time network with bounded precision to accept a language. A measure of network memory follows from this definition. We then characterize the classes of languages acceptable by various recurrent networks, attention, and convolutional networks. We find that LSTMs function like counter machines and relate convolutional networks to the subregular hierarchy. Overall, this work attempts to increase our understanding and ability to interpret neural networks through the lens of theory. These theoretical insights help explain neural computation, as well as the relationship between neural networks and natural language grammar.

Sequential Neural Networks as Automata

TL;DR

The paper introduces the notion of asymptotic language acceptance and state complexity to analyze what types of formal languages different neural sequence architectures can recognize. By formalizing real-time, bounded-precision constraints, it derives automata-theoretic characterizations for SRNs, GRUs, LSTMs, attention, and CNNs, and places them in a hierarchy between regular and more expressive language classes. Empirical experiments on counting, counting with noise, and string reversal test the predictions, revealing both alignment and gaps and highlighting the stabilizing effect of noise regularization. The work provides a principled framework linking neural computation to formal language theory and grammar, with implications for interpretability and architectural design.

Abstract

This work attempts to explain the types of computation that neural networks can perform by relating them to automata. We first define what it means for a real-time network with bounded precision to accept a language. A measure of network memory follows from this definition. We then characterize the classes of languages acceptable by various recurrent networks, attention, and convolutional networks. We find that LSTMs function like counter machines and relate convolutional networks to the subregular hierarchy. Overall, this work attempts to increase our understanding and ability to interpret neural networks through the lens of theory. These theoretical insights help explain neural computation, as well as the relationship between neural networks and natural language grammar.

Paper Structure

This paper contains 20 sections, 25 theorems, 68 equations, 1 figure, 2 tables.

Key Result

Theorem 3.1

For any length $n$, the SRN cell state $\mathbf{h}_n \in \mathbb{R}^k$ has state complexity

Figures (1)

  • Figure 1: With sigmoid activations, the network on the left accepts a sequence of bits if and only if $x_t = 1$ for some $t$. On the right is the discrete computation graph that the network approaches asymptotically.

Theorems & Definitions (55)

  • Definition 2.1: Neural sequence acceptor
  • Definition 2.2: Asymptotic acceptance
  • Definition 2.3: Hidden state
  • Definition 2.4: Configuration set
  • Definition 2.5: Fixed state complexity
  • Definition 2.6: General state complexity
  • Definition 3.1: SRN layer
  • Theorem 3.1: SRN state complexity
  • proof
  • Theorem 3.2: SRN characterization
  • ...and 45 more