A General Theory for Compositional Generalization
Jingwen Fu, Zhizheng Zhang, Yan Lu, Nanning Zheng
TL;DR
This work tackles compositional generalization (CG) from a task-agnostic perspective, formalizing CG via well-defined concepts and a compositional rule. It proves a No Free Lunch (NFL) theorem for CG, showing that no universal solver exists across all CG tasks, and derives a generalization bound that ties CG performance to the mutual information with the composition rule. It introduces the generative effect, classifies CG problems into independent-rule and generative-interactive types, and provides a sufficient condition for IRM-solvable cases while outlining challenges for generative interactions. The theory offers a unifying baseline that, when combined with task-specific analyses, guides method design and advances understanding of CG, while acknowledging limitations in delivering turnkey solutions for particular CG tasks.
Abstract
Compositional Generalization (CG) embodies the ability to comprehend novel combinations of familiar concepts, representing a significant cognitive leap in human intellectual advancement. Despite its critical importance, the deep neural network (DNN) faces challenges in addressing the compositional generalization problem, prompting considerable research interest. However, existing theories often rely on task-specific assumptions, constraining the comprehensive understanding of CG. This study aims to explore compositional generalization from a task-agnostic perspective, offering a complementary viewpoint to task-specific analyses. The primary challenge is to define CG without overly restricting its scope, a feat achieved by identifying its fundamental characteristics and basing the definition on them. Using this definition, we seek to answer the question "what does the ultimate solution to CG look like?" through the following theoretical findings: 1) the first No Free Lunch theorem in CG, indicating the absence of general solutions; 2) a novel generalization bound applicable to any CG problem, specifying the conditions for an effective CG solution; and 3) the introduction of the generative effect to enhance understanding of CG problems and their solutions. This paper's significance lies in providing a general theory for CG problems, which, when combined with prior theorems under task-specific scenarios, can lead to a comprehensive understanding of CG.
