Table of Contents
Fetching ...

Bearing Syntactic Fruit with Stack-Augmented Neural Networks

Brian DuSell, Ryan Cotterell

TL;DR

The paper investigates whether stack-augmented neural networks can exhibit human-like hierarchical generalization without syntactic supervision. By implementing differentiable stacks (superposition and nondeterministic VPDA) atop Transformer, RNN, and LSTM bases, and evaluating on question formation and tense reinflection, it shows that a Transformer with nondeterministic stack attention achieves the strongest hierarchical bias on question formation, including substantial generalization beyond the training distribution. A simple short-circuit that feeds the stack reading directly to the output further improves hierarchical generalization for RNNs and LSTMs. Yet, hierarchical generalization remains elusive for tense reinflection, suggesting task-dependent limits and motivating further exploration of stack-based architectures as models of language acquisition and psycholinguistic study.

Abstract

Any finite set of training data is consistent with an infinite number of hypothetical algorithms that could have generated it. Studies have shown that when human children learn language, they consistently favor hypotheses based on hierarchical syntactic rules without ever encountering disambiguating examples. A recent line of work has inquired as to whether common neural network architectures share this bias, finding that they do so only under special conditions: when syntactically supervised, when pre-trained on massive corpora, or when trained long past convergence. In this paper, we demonstrate, for the first time, neural network architectures that are able to generalize in human-like fashion without any of the aforementioned requirements: stack-augmented neural networks. We test three base architectures (transformer, simple RNN, LSTM) augmented with two styles of stack: the superposition stack of Joulin & Mikolov (2015) and a nondeterministic generalization of it proposed by DuSell & Chiang (2023). We find that transformers with nondeterministic stacks generalize best out of these architectures on a classical question formation task. We also propose a modification to the stack RNN architecture that improves hierarchical generalization. These results suggest that stack-augmented neural networks may be more accurate models of human language acquisition than standard architectures, serving as useful objects of psycholinguistic study. Our code is publicly available.

Bearing Syntactic Fruit with Stack-Augmented Neural Networks

TL;DR

The paper investigates whether stack-augmented neural networks can exhibit human-like hierarchical generalization without syntactic supervision. By implementing differentiable stacks (superposition and nondeterministic VPDA) atop Transformer, RNN, and LSTM bases, and evaluating on question formation and tense reinflection, it shows that a Transformer with nondeterministic stack attention achieves the strongest hierarchical bias on question formation, including substantial generalization beyond the training distribution. A simple short-circuit that feeds the stack reading directly to the output further improves hierarchical generalization for RNNs and LSTMs. Yet, hierarchical generalization remains elusive for tense reinflection, suggesting task-dependent limits and motivating further exploration of stack-based architectures as models of language acquisition and psycholinguistic study.

Abstract

Any finite set of training data is consistent with an infinite number of hypothetical algorithms that could have generated it. Studies have shown that when human children learn language, they consistently favor hypotheses based on hierarchical syntactic rules without ever encountering disambiguating examples. A recent line of work has inquired as to whether common neural network architectures share this bias, finding that they do so only under special conditions: when syntactically supervised, when pre-trained on massive corpora, or when trained long past convergence. In this paper, we demonstrate, for the first time, neural network architectures that are able to generalize in human-like fashion without any of the aforementioned requirements: stack-augmented neural networks. We test three base architectures (transformer, simple RNN, LSTM) augmented with two styles of stack: the superposition stack of Joulin & Mikolov (2015) and a nondeterministic generalization of it proposed by DuSell & Chiang (2023). We find that transformers with nondeterministic stacks generalize best out of these architectures on a classical question formation task. We also propose a modification to the stack RNN architecture that improves hierarchical generalization. These results suggest that stack-augmented neural networks may be more accurate models of human language acquisition than standard architectures, serving as useful objects of psycholinguistic study. Our code is publicly available.

Paper Structure

This paper contains 16 sections, 8 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Two algorithms consistent with the training set for question formation.
  • Figure 2: Examples that illustrate the difference between the training (shaded white) and generalization (shaded gray) sets of each task.
  • Figure 3: Two algorithms consistent with the training set for tense reinflection.
  • Figure 4: Conceptual diagram of the controller-stack interface for RNNs and LSTMs, unrolled across a portion of time and with $L = 2$ layers. The dashed lines indicate our proposed short-circuit connection from the stack reading to the output.
  • Figure 5: Conceptual diagram of a stack attention sublayer, unrolled across a portion of time. Dotted arrows indicate linear transformations, and dashed arrows indicate residual connections.