Table of Contents
Fetching ...

Compositional generalization through meta sequence-to-sequence learning

Brenden M. Lake

TL;DR

Standard seq2seq models struggle with compositional generalization. The paper introduces meta seq2seq learning, a memory-augmented, episodic meta-training framework that enables learning of rule-like generalization over sequences. Across several SCAN-based experiments, the approach achieves strong performance on many compositional tasks, but fails to systematically generalize to much longer output sequences, highlighting both progress and remaining gaps. These findings suggest a promising direction for combining memory and meta-learning, with potential extensions toward neuro-symbolic hybrids to handle longer and more abstract generalizations.

Abstract

People can learn a new concept and use it compositionally, understanding how to "blicket twice" after learning how to "blicket." In contrast, powerful sequence-to-sequence (seq2seq) neural networks fail such tests of compositionality, especially when composing new concepts together with existing concepts. In this paper, I show how memory-augmented neural networks can be trained to generalize compositionally through meta seq2seq learning. In this approach, models train on a series of seq2seq problems to acquire the compositional skills needed to solve new seq2seq problems. Meta se2seq learning solves several of the SCAN tests for compositional learning and can learn to apply implicit rules to variables.

Compositional generalization through meta sequence-to-sequence learning

TL;DR

Standard seq2seq models struggle with compositional generalization. The paper introduces meta seq2seq learning, a memory-augmented, episodic meta-training framework that enables learning of rule-like generalization over sequences. Across several SCAN-based experiments, the approach achieves strong performance on many compositional tasks, but fails to systematically generalize to much longer output sequences, highlighting both progress and remaining gaps. These findings suggest a promising direction for combining memory and meta-learning, with potential extensions toward neuro-symbolic hybrids to handle longer and more abstract generalizations.

Abstract

People can learn a new concept and use it compositionally, understanding how to "blicket twice" after learning how to "blicket." In contrast, powerful sequence-to-sequence (seq2seq) neural networks fail such tests of compositionality, especially when composing new concepts together with existing concepts. In this paper, I show how memory-augmented neural networks can be trained to generalize compositionally through meta seq2seq learning. In this approach, models train on a series of seq2seq problems to acquire the compositional skills needed to solve new seq2seq problems. Meta se2seq learning solves several of the SCAN tests for compositional learning and can learn to apply implicit rules to variables.

Paper Structure

This paper contains 12 sections, 3 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The meta sequence-to-sequence learner. The backbone is a sequence-to-sequence (seq2seq) network augmented with a context $C$ produced by an external memory. The seq2seq model uses an RNN encoder ($f_{ie}$; bottom right) to read a query and then pass stepwise messages $Q$ to an attention-based RNN decoder ($f_{od}$; top right). Distinctive to meta seq2seq learning, the messages $Q$ are transformed into $C$ based on context from the support set (left). The transformation operates through a key-value memory. Support item inputs are encoded and used a keys $K$ while outputs are encoded and used as value $V$. The query is stepwise compared to the keys, retrieving weighted sums $M$ of the most similar values. This is mapped to $C$ which is decoded as the final output sequence. Color coding indicates shared RNN modules.
  • Figure 2: The mutual exclusivity task showing two meta-training episodes (left) and one test episode (right). Each episode requires executing instructions in a novel language of 4 input pseudowords ("dax", "wif", etc.) and four output actions ("red", "yellow", etc.). Each episode has a random mapping from pseudowords to meanings, providing three isolated words and their outputs as support. Answering queries requires concatenation as well as reasoning by mutual exclusivity to infer the fourth mapping ("dax" means "blue" in the test episode).
  • Figure A.1: During a test episode of the ME task, the support set (top left) and two queries are shown. The ME inference is that "dax" maps to "blue." The key-value memory attention $A$ for each query is shown in the left matrix, with rows as encoder steps and columns as support items. The decoder attention for each query is shown in the right matrix, with rows as the decoder steps and columns as encoder steps. <EOS> marks the end-of-sequence.
  • Figure A.2: Attention in meta seq2seq learning on the SCAN task. During test, the network is evaluated on the query "walk left after run right thrice." <EOS> marks the end-of-sequence.