Table of Contents
Fetching ...

Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations

Jonathan Herzig, Peter Shaw, Ming-Wei Chang, Kelvin Guu, Panupong Pasupat, Yuan Zhang

TL;DR

This work tackles the challenge of compositional generalization in semantic parsing by introducing intermediate representations (IRs) that map an input utterance $x$ to a structured intermediate form $z$ before producing the executable program $y$, all without altering model architecture. It defines reversible IRs $z_r$ and lossy IRs $z_l$ and implements a two-stage decoding pipeline that either directly inverts $z_r$ to $y$ or conditions a second seq2seq model on $x$ and $z$ to produce $y$. Across CFQ, text-to-SQL template splits, and SCAN, the proposed IRs yield large gains, with state-of-the-art improvements of +$14.8$ on CFQ and +$15.0$ to +$19.4$ on template SQL datasets, while preserving i.i.d. performance. The results demonstrate that IRs provide a powerful, model-agnostic lever to unlock compositional generalization in pre-trained seq2seq models, offering a practical path to more robust semantic parsing systems.

Abstract

Sequence-to-sequence (seq2seq) models are prevalent in semantic parsing, but have been found to struggle at out-of-distribution compositional generalization. While specialized model architectures and pre-training of seq2seq models have been proposed to address this issue, the former often comes at the cost of generality and the latter only shows limited success. In this paper, we study the impact of intermediate representations on compositional generalization in pre-trained seq2seq models, without changing the model architecture at all, and identify key aspects for designing effective representations. Instead of training to directly map natural language to an executable form, we map to a reversible or lossy intermediate representation that has stronger structural correspondence with natural language. The combination of our proposed intermediate representations and pre-trained models is surprisingly effective, where the best combinations obtain a new state-of-the-art on CFQ (+14.8 accuracy points) and on the template-splits of three text-to-SQL datasets (+15.0 to +19.4 accuracy points). This work highlights that intermediate representations provide an important and potentially overlooked degree of freedom for improving the compositional generalization abilities of pre-trained seq2seq models.

Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations

TL;DR

This work tackles the challenge of compositional generalization in semantic parsing by introducing intermediate representations (IRs) that map an input utterance to a structured intermediate form before producing the executable program , all without altering model architecture. It defines reversible IRs and lossy IRs and implements a two-stage decoding pipeline that either directly inverts to or conditions a second seq2seq model on and to produce . Across CFQ, text-to-SQL template splits, and SCAN, the proposed IRs yield large gains, with state-of-the-art improvements of + on CFQ and + to + on template SQL datasets, while preserving i.i.d. performance. The results demonstrate that IRs provide a powerful, model-agnostic lever to unlock compositional generalization in pre-trained seq2seq models, offering a practical path to more robust semantic parsing systems.

Abstract

Sequence-to-sequence (seq2seq) models are prevalent in semantic parsing, but have been found to struggle at out-of-distribution compositional generalization. While specialized model architectures and pre-training of seq2seq models have been proposed to address this issue, the former often comes at the cost of generality and the latter only shows limited success. In this paper, we study the impact of intermediate representations on compositional generalization in pre-trained seq2seq models, without changing the model architecture at all, and identify key aspects for designing effective representations. Instead of training to directly map natural language to an executable form, we map to a reversible or lossy intermediate representation that has stronger structural correspondence with natural language. The combination of our proposed intermediate representations and pre-trained models is surprisingly effective, where the best combinations obtain a new state-of-the-art on CFQ (+14.8 accuracy points) and on the template-splits of three text-to-SQL datasets (+15.0 to +19.4 accuracy points). This work highlights that intermediate representations provide an important and potentially overlooked degree of freedom for improving the compositional generalization abilities of pre-trained seq2seq models.

Paper Structure

This paper contains 29 sections, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Our framework for parsing utterances ($x$) into programs ($y$) through reversible ($z_r$) and lossy ($z_l$) intermediate representations using seq2seq models.
  • Figure 2: Examples for the different formalisms, of an utterance ($x$), program ($y$), reversible intermediate representation ($z_r$) and lossy intermediate representation ($z_l$). For each formalism, tokens with the same color share their semantic role. Tokens in $z_r$ and $z_l$ that are modified w.r.t. $y$ are in bold. We abbreviate original SPARQL relations, and also abbreviate the SQL table names airline, airport and flight to AL, AP and FL, respectively.
  • Figure 3: Compared to Baseline (T5-base), the best IR of each split maintains the baseline accuracy for i.i.d. splits while giving large gains for compositional splits.
  • Figure 4: Example cases where LIRd+RIR produces correct programs whereas the baseline T5 does not. SQL table names were shortened here for brevity.