Table of Contents
Fetching ...

Compositional generalization in a deep seq2seq model by separating syntax and semantics

Jake Russin, Jason Jo, Randall C. O'Reilly, Yoshua Bengio

TL;DR

The paper tackles the challenge of compositional generalization in neural NLP by introducing Syntactic Attention, a two-stream architecture that separates syntax (sequential, alignment-focused processing) from semantics (word-level mappings) and combines them via attention. Grounded in neuroscience-inspired intuition about distinct language systems, the model achieves substantial improvements on the SCAN add-jump task without extra supervision, turning o.o.d. generalization into two i.i.d. problems. The results include strong performance gains over prior models, detailed analyses of variability, and supplementary experiments showing robustness to semantic parametrization and conditions under which the approach succeeds or falters. Overall, the work highlights the potential gains from incorporating cognitive principles into neural architectures to enhance systematic generalization in language tasks.

Abstract

Standard methods in deep learning for natural language processing fail to capture the compositional structure of human language that allows for systematic generalization outside of the training distribution. However, human learners readily generalize in this way, e.g. by applying known grammatical rules to novel words. Inspired by work in neuroscience suggesting separate brain systems for syntactic and semantic processing, we implement a modification to standard approaches in neural machine translation, imposing an analogous separation. The novel model, which we call Syntactic Attention, substantially outperforms standard methods in deep learning on the SCAN dataset, a compositional generalization task, without any hand-engineered features or additional supervision. Our work suggests that separating syntactic from semantic learning may be a useful heuristic for capturing compositional structure.

Compositional generalization in a deep seq2seq model by separating syntax and semantics

TL;DR

The paper tackles the challenge of compositional generalization in neural NLP by introducing Syntactic Attention, a two-stream architecture that separates syntax (sequential, alignment-focused processing) from semantics (word-level mappings) and combines them via attention. Grounded in neuroscience-inspired intuition about distinct language systems, the model achieves substantial improvements on the SCAN add-jump task without extra supervision, turning o.o.d. generalization into two i.i.d. problems. The results include strong performance gains over prior models, detailed analyses of variability, and supplementary experiments showing robustness to semantic parametrization and conditions under which the approach succeeds or falters. Overall, the work highlights the potential gains from incorporating cognitive principles into neural architectures to enhance systematic generalization in language tasks.

Abstract

Standard methods in deep learning for natural language processing fail to capture the compositional structure of human language that allows for systematic generalization outside of the training distribution. However, human learners readily generalize in this way, e.g. by applying known grammatical rules to novel words. Inspired by work in neuroscience suggesting separate brain systems for syntactic and semantic processing, we implement a modification to standard approaches in neural machine translation, imposing an analogous separation. The novel model, which we call Syntactic Attention, substantially outperforms standard methods in deep learning on the SCAN dataset, a compositional generalization task, without any hand-engineered features or additional supervision. Our work suggests that separating syntactic from semantic learning may be a useful heuristic for capturing compositional structure.

Paper Structure

This paper contains 23 sections, 7 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Simplified illustration of out-of-domain (o.o.d.) extrapolation required by SCAN compositional generalization task. Shapes represent the distribution of all possible command sequences. In a simple split, train and test data are independent and identically distributed (i.i.d.), but in the add-primitive splits, models are required to extrapolate out-of-domain from a single example.
  • Figure 2: (left) Syntactic Attention architecture. Syntactic and semantic information are maintained in separate streams. The semantic stream processes words with a simple linear transformation, so that sequential information is not maintained. This information is used to directly produce actions. The syntactic stream processes inputs with a recurrent neural network, allowing it to capture temporal dependencies between words. This stream determines the attention over semantic representations at each time step during decoding. (right) Diagram of an influential computational model of prefrontal cortex (PFC) MillerCohen01. Prefrontal cortex dynamically modulates processes in other parts of the brain through top-down selective attention signals. A part of the prefrontal cortex, Broca's area, is thought to be important for syntactic processing Thompson-Schill04. Figure reproduced from Miller13.
  • Figure 3: Examples from SCAN dataset. Figure reproduced from LakeBaroni17b.
  • Figure 4: Illustration of the transformation of an out-of-domain (o.o.d.) generalization problem into two independent, identically distributed (i.i.d.) generalization problems. This transformation is accomplished by the Syntactic Attention model without hand-coding grammatical rules or supervising with additional information such as parts-of-speech tags.
  • Figure 5: Phrase-structure grammar used to generate SCAN dataset. Figure reproduced from LakeBaroni17b.
  • ...and 10 more figures