Table of Contents
Fetching ...

On Efficiently Representing Regular Languages as RNNs

Anej Svete, Robin Shing Moon Chan, Ryan Cotterell

Abstract

Recent work by Hewitt et al. (2020) provides an interpretation of the empirical success of recurrent neural networks (RNNs) as language models (LMs). It shows that RNNs can efficiently represent bounded hierarchical structures that are prevalent in human language. This suggests that RNNs' success might be linked to their ability to model hierarchy. However, a closer inspection of Hewitt et al.'s (2020) construction shows that it is not inherently limited to hierarchical structures. This poses a natural question: What other classes of LMs can RNNs efficiently represent? To this end, we generalize Hewitt et al.'s (2020) construction and show that RNNs can efficiently represent a larger class of LMs than previously claimed -- specifically, those that can be represented by a pushdown automaton with a bounded stack and a specific stack update function. Altogether, the efficiency of representing this diverse class of LMs with RNN LMs suggests novel interpretations of their inductive bias.

On Efficiently Representing Regular Languages as RNNs

Abstract

Recent work by Hewitt et al. (2020) provides an interpretation of the empirical success of recurrent neural networks (RNNs) as language models (LMs). It shows that RNNs can efficiently represent bounded hierarchical structures that are prevalent in human language. This suggests that RNNs' success might be linked to their ability to model hierarchy. However, a closer inspection of Hewitt et al.'s (2020) construction shows that it is not inherently limited to hierarchical structures. This poses a natural question: What other classes of LMs can RNNs efficiently represent? To this end, we generalize Hewitt et al.'s (2020) construction and show that RNNs can efficiently represent a larger class of LMs than previously claimed -- specifically, those that can be represented by a pushdown automaton with a bounded stack and a specific stack update function. Altogether, the efficiency of representing this diverse class of LMs with RNN LMs suggests novel interpretations of their inductive bias.
Paper Structure (29 sections, 8 theorems, 38 equations, 5 figures)

This paper contains 29 sections, 8 theorems, 38 equations, 5 figures.

Key Result

Theorem 3.1

The family of LMs induced by PFSAs is weakly equivalent to the family of LMs induced by BPDAs.

Figures (5)

  • Figure 1: An illustration of how an RNN can store information about a fixed number of symbols (in this case, three) that have appeared in the string ${{ {{\boldsymbol{y}}}_{<{{ t}}}}}$. Using some mechanism, the symbols ${{y}}_2, {{y}}_{{{ t}} - 4}, {{y}}_{{{ t}} - 1}$ have been selected for determining the continuation of the string and are stored in ${{ {{{ \mathbf{h}}}}}}$. These symbols are used to compute the conditional probability of the next symbol ${{\overline{{{y}}}}}_{{ t}}$.
  • Figure 2: A simplified $3$-gram LM over ${{ \Sigma}} = {{\left\{ {{a}}, {{b}} \right\}}}$. Even though the number of states is exponential in ${{ \textit{n}}}$, the hidden state of the RNN only has to keep the ${{ \textit{n}}} - 1 = 2$ symbols of interest, each of which is represented by $\lceil\log_2{{ |{{ \Sigma}}|}}\rceil$ bits. This is illustrated by the state ${{a}} {{b}}$ being represented as ${{ {{{ \mathbf{h}}}}}} = {{a}}{{b}}$.
  • Figure 3: An illustration of how a BPDA can compute the probability of a string under an n-gram LM.
  • Figure 4: ${{ \texttt{PUSH}}}\xspace$ moves the stack down, discards the bottom-most elements, and inserts a new top.
  • Figure 5: An illustration of how one can think of BPDA LMs as being represented by three different mechanisms: a BPDA, a black-box PFSA, and an RNN.

Theorems & Definitions (26)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 3.1
  • Definition 3.2
  • Theorem 3.1
  • proof
  • Definition 3.3
  • Definition 3.4
  • Definition 3.5
  • ...and 16 more