Learning to Substitute Components for Compositional Generalization

Zhaoyi Li; Gangwei Jiang; Chenwang Wu; Ying Wei; Defu Lian; Enhong Chen

Learning to Substitute Components for Compositional Generalization

Zhaoyi Li, Gangwei Jiang, Chenwang Wu, Ying Wei, Defu Lian, Enhong Chen

TL;DR

This work tackles the limited compositional generalization of neural language models by introducing CompSub, a span-based compositional data augmentation that enables multi-grained substitutions across training data. Building on CompSub, the authors present Learning Component Substitution (LCS), a differentiable augmenter that learns substitution probabilities by maximizing downstream loss, thereby prioritizing challenging and novel compositions; they further extend these ideas to in-context learning with LCS-ICL for state-of-the-art LLMs. Theoretical analyses show CompSub acts as an implicit regularizer that promotes semantic invariance and reduces Rademacher complexity, while empirical results across SCAN, COGS, GeoQuery, and COGS-QL demonstrate substantial gains (up to 66.5% on SCAN and 10.3% on COGS, with additional improvements for LCS and LCS-ICL). Overall, the approach provides a principled, end-to-end, and model-agnostic framework to inject multi-grained compositional bias and improve few-shot and in-context generalization in language tasks.

Abstract

Despite the rising prevalence of neural language models, recent empirical evidence suggests their deficiency in compositional generalization. One of the current de-facto solutions to this problem is compositional data augmentation, which aims to introduce additional compositional inductive bias. However, existing handcrafted augmentation strategies offer limited improvement when systematic generalization of neural language models requires multi-grained compositional bias (i.e., not limited to either lexical or structural biases alone) or when training sentences have an imbalanced difficulty distribution. To address these challenges, we first propose a novel compositional augmentation strategy called Component Substitution (CompSub), which enables multi-grained composition of substantial substructures across the entire training set. Furthermore, we introduce the Learning Component Substitution (LCS) framework. This framework empowers the learning of component substitution probabilities in CompSub in an end-to-end manner by maximizing the loss of neural language models, thereby prioritizing challenging compositions with elusive concepts and novel contexts. We extend the key ideas of CompSub and LCS to the recently emerging in-context learning scenarios of pre-trained large language models (LLMs), proposing the LCS-ICL algorithm to enhance the few-shot compositional generalization of state-of-the-art (SOTA) LLMs. Theoretically, we provide insights into why applying our algorithms to language models can improve compositional generalization performance. Empirically, our results on four standard compositional generalization benchmarks(SCAN, COGS, GeoQuery, and COGS-QL) demonstrate the superiority of CompSub, LCS, and LCS-ICL, with improvements of up to 66.5%, 10.3%, 1.4%, and 8.8%, respectively.

Learning to Substitute Components for Compositional Generalization

TL;DR

Abstract

Learning to Substitute Components for Compositional Generalization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (12)