Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation

Tianqi Zhong; Zhaoyi Li; Quan Wang; Linqi Song; Ying Wei; Defu Lian; Zhendong Mao

Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation

Tianqi Zhong, Zhaoyi Li, Quan Wang, Linqi Song, Ying Wei, Defu Lian, Zhendong Mao

TL;DR

The paper tackles the challenge of compositional generalization in multi-aspect controllable text generation by introducing CompMCTG, a holistic benchmark built from Fyelp, Amazon, YELP, and Mixture with a three-dimensional Hold-Out/ACD/Few-Shot evaluation protocol. It shows that existing MCTG methods, including decoding-time, separate-training, and joint-training approaches, suffer noticeable generalization gaps when tested on novel attribute combinations, with LLMs offering fluent but less controllable outputs. To mitigate this, the authors propose Meta-MCTG, a meta-learning framework inspired by MAML that jointly trains with pseudo-compositional batches to encourage better generalization to unseen attribute recombinations. Across experiments on eight baselines and two LLMs, Meta-MCTG improves compositional testing performance in the majority of cases (up to 3.64% gain) while preserving fluency and often enhancing in-distribution performance, underscoring the benchmark’s value and the potential of meta-learning to enhance compositional robustness in MCTG.

Abstract

Compositional generalization, representing the model's ability to generate text with new attribute combinations obtained by recombining single attributes from the training data, is a crucial property for multi-aspect controllable text generation (MCTG) methods. Nonetheless, a comprehensive compositional generalization evaluation benchmark of MCTG is still lacking. We propose CompMCTG, a benchmark encompassing diverse multi-aspect labeled datasets and a crafted three-dimensional evaluation protocol, to holistically evaluate the compositional generalization of MCTG approaches. We observe that existing MCTG works generally confront a noticeable performance drop in compositional testing. To mitigate this issue, we introduce Meta-MCTG, a training framework incorporating meta-learning, where we enable models to learn how to generalize by simulating compositional generalization scenarios in the training phase. We demonstrate the effectiveness of Meta-MCTG through achieving obvious improvement (by at most 3.64%) for compositional testing performance in 94.4% cases.

Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation

TL;DR

Abstract

Paper Structure (60 sections, 10 equations, 11 figures, 54 tables, 2 algorithms)

This paper contains 60 sections, 10 equations, 11 figures, 54 tables, 2 algorithms.

Introduction
Related Work
Multi-aspect Controllable Text Generation
Compositional Generalization
Benchmark: CompMCTG
On the Construction of CompMCTG
Data Source
Three-Dimensional evaluation Protocol
Baseline and Evaluation Metric
Evaluation Result
Insight
Compositional gaps with different evaluation protocols.
Does the ACD better unveil the compositional generalization risk in comparison with Random Sampling?
Methodlogy: Meta-MCTG
Design
...and 45 more sections

Figures (11)

Figure 1: Three evaluation protocols in CompMCTG benchmark, where each set of three colored balls represents texts with these three attribute labels (e.g., positive, plural, and present). "I.D." denotes the In-Distribution set and "Comp." denotes the Compositional set.
Figure 2: Compositional generalization gap with different evaluation protocols.
Figure 3: Comparison of compositional gaps between ACD (green bars) and two other splitting methods: Random Sampling (red bars) and minimizing the divergence (blue bars) on five baselines.
Figure 4: Meta-MCTG: $\theta$ refers to the learnable parameters for encoding control conditions, which could be inner (CTRL) or added (DCG and ContraPrefix). $\phi$, the parameters of LMs, are usually frozen during training (PEFT).
Figure 5: Difference of the distances ($d = 1 - cos<h_1, h_2>$) between attribute combinations in the representation space ($h_1, h_2$) with Meta-CTRL and the origin version of CTRL.
...and 6 more figures

Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation

TL;DR

Abstract

Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)