A General Theory for Compositional Generalization

Jingwen Fu; Zhizheng Zhang; Yan Lu; Nanning Zheng

A General Theory for Compositional Generalization

Jingwen Fu, Zhizheng Zhang, Yan Lu, Nanning Zheng

TL;DR

This work tackles compositional generalization (CG) from a task-agnostic perspective, formalizing CG via well-defined concepts and a compositional rule. It proves a No Free Lunch (NFL) theorem for CG, showing that no universal solver exists across all CG tasks, and derives a generalization bound that ties CG performance to the mutual information with the composition rule. It introduces the generative effect, classifies CG problems into independent-rule and generative-interactive types, and provides a sufficient condition for IRM-solvable cases while outlining challenges for generative interactions. The theory offers a unifying baseline that, when combined with task-specific analyses, guides method design and advances understanding of CG, while acknowledging limitations in delivering turnkey solutions for particular CG tasks.

Abstract

Compositional Generalization (CG) embodies the ability to comprehend novel combinations of familiar concepts, representing a significant cognitive leap in human intellectual advancement. Despite its critical importance, the deep neural network (DNN) faces challenges in addressing the compositional generalization problem, prompting considerable research interest. However, existing theories often rely on task-specific assumptions, constraining the comprehensive understanding of CG. This study aims to explore compositional generalization from a task-agnostic perspective, offering a complementary viewpoint to task-specific analyses. The primary challenge is to define CG without overly restricting its scope, a feat achieved by identifying its fundamental characteristics and basing the definition on them. Using this definition, we seek to answer the question "what does the ultimate solution to CG look like?" through the following theoretical findings: 1) the first No Free Lunch theorem in CG, indicating the absence of general solutions; 2) a novel generalization bound applicable to any CG problem, specifying the conditions for an effective CG solution; and 3) the introduction of the generative effect to enhance understanding of CG problems and their solutions. This paper's significance lies in providing a general theory for CG problems, which, when combined with prior theorems under task-specific scenarios, can lead to a comprehensive understanding of CG.

A General Theory for Compositional Generalization

TL;DR

Abstract

Paper Structure (28 sections, 11 theorems, 32 equations, 1 figure, 1 table)

This paper contains 28 sections, 11 theorems, 32 equations, 1 figure, 1 table.

Introduction
Motivation and Related Works
Problem Definition
Preliminary
Compositional Generalization
No free lunch theorem
Generalizaton Bounds
Tightness
Proof Sketch
Generative Effect
Independent rule mechanism
Generative effect
Conclusion
Limitation
Other related work
...and 13 more sections

Key Result

Theorem 4.5

Under Assumption ass:simplify_func_learning_algorithm, for all valid division of $S$ and $U$, and any $(\mathcal{A}_1,\mathcal{F}_1),(\mathcal{A}_2,\mathcal{F}_2)$, satisfying $|\mathcal{F}_1|=|\mathcal{F}_2|$, we have where $\Tilde{\boldsymbol{T}}=\Tilde{T}_{\boldsymbol{f}_S}$ and $\mathbb{P}_{S}^{(T)}$ is the support distribution generated using the compositional rule $T$.

Figures (1)

Figure 1: Generalization bounds on the toy problem. The example 1 considers the case where the function space has some bias while the learning algorithm has no bias. The example 2 consider the learning algorithm has certain bias while the function space is powerful to fit data. We find that 1) our bounds can capture the decrease of generalization error in example 1 and 2) our can align with the generalization error in example 2.

Theorems & Definitions (52)

Definition 3.1
Remark 3.2
Remark 3.3
Example 3.4
Example 3.5
Remark 3.6
Definition 3.7
Remark 3.8
Definition 4.1
Remark 4.3
...and 42 more

A General Theory for Compositional Generalization

TL;DR

Abstract

A General Theory for Compositional Generalization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (52)