Bearing Syntactic Fruit with Stack-Augmented Neural Networks

Brian DuSell; Ryan Cotterell

Bearing Syntactic Fruit with Stack-Augmented Neural Networks

Brian DuSell, Ryan Cotterell

TL;DR

The paper investigates whether stack-augmented neural networks can exhibit human-like hierarchical generalization without syntactic supervision. By implementing differentiable stacks (superposition and nondeterministic VPDA) atop Transformer, RNN, and LSTM bases, and evaluating on question formation and tense reinflection, it shows that a Transformer with nondeterministic stack attention achieves the strongest hierarchical bias on question formation, including substantial generalization beyond the training distribution. A simple short-circuit that feeds the stack reading directly to the output further improves hierarchical generalization for RNNs and LSTMs. Yet, hierarchical generalization remains elusive for tense reinflection, suggesting task-dependent limits and motivating further exploration of stack-based architectures as models of language acquisition and psycholinguistic study.

Abstract

Any finite set of training data is consistent with an infinite number of hypothetical algorithms that could have generated it. Studies have shown that when human children learn language, they consistently favor hypotheses based on hierarchical syntactic rules without ever encountering disambiguating examples. A recent line of work has inquired as to whether common neural network architectures share this bias, finding that they do so only under special conditions: when syntactically supervised, when pre-trained on massive corpora, or when trained long past convergence. In this paper, we demonstrate, for the first time, neural network architectures that are able to generalize in human-like fashion without any of the aforementioned requirements: stack-augmented neural networks. We test three base architectures (transformer, simple RNN, LSTM) augmented with two styles of stack: the superposition stack of Joulin & Mikolov (2015) and a nondeterministic generalization of it proposed by DuSell & Chiang (2023). We find that transformers with nondeterministic stacks generalize best out of these architectures on a classical question formation task. We also propose a modification to the stack RNN architecture that improves hierarchical generalization. These results suggest that stack-augmented neural networks may be more accurate models of human language acquisition than standard architectures, serving as useful objects of psycholinguistic study. Our code is publicly available.

Bearing Syntactic Fruit with Stack-Augmented Neural Networks

TL;DR

Abstract

Bearing Syntactic Fruit with Stack-Augmented Neural Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)