Pelican Soup Framework: A Theoretical Framework for Language Model Capabilities
Ting-Rui Chiang, Dani Yogatama
TL;DR
The paper investigates why pretraining enables large language models to follow prompts and perform in-context learning, proposing the Pelican Soup framework as a minimal theoretical model grounded in consistency and an expression-meaning association. It formalizes tasks via a KB and a finite set of atom concepts, derives an average ICL-loss bound with an $\mathcal{O}(1/T)$ convergence rate, and links this bound to description length under additional assumptions. The authors validate the framework through the Calcutec synthetic experiments and real-world pronoun-based prompting, demonstrating ICL emergence, generalization under distribution shifts, and instruction-following capabilities, including multi-step reasoning. The work provides a conceptual bridge between linguistic/psychological theories and empirical ICL phenomena, offering guidance for pretraining design and future research into robust instruction-following and generalization in LLMs.
Abstract
In this work, we propose a simple theoretical framework, Pelican Soup, aiming to better understand how pretraining allows LLMs to (1) generalize to unseen instructions and (2) perform in-context learning, even when the verbalizers are irrelevant to the task. To this end, in our framework, we introduce the notion of "knowledge base" and "reference-sense association" and a simple formalism for natural language processing tasks. Our framework demonstrates how linguistic, psychology, and philosophy studies can inform our understanding of the language model and is connected to several other existing theoretical results. As an illustration of the usage of our framework, we derive a bound on in-context learning loss with our framework. Finally, we support our framework with empirical experiments and provide possible future research directions.
