Table of Contents
Fetching ...

Generation through the lens of learning theory

Jiaxun Li, Vinod Raman, Ambuj Tewari

TL;DR

This work reframes generation as a learning-theoretic problem over countable domains, introducing the Closure dimension $C(\\mathcal{H})$ to characterize uniform generatability and a non-uniform union-based criterion for non-uniform generatability. It proves that uniform generatability holds exactly when $C(\\mathcal{H})<\infty$, while non-uniform generatability corresponds to forming a nondecreasing union of sub-classes with finite Closure dimension, and explores generatability in the limit along with EUC-based sufficiency results. The paper further extends the framework to prompted generation via the Prompted Closure dimension $\\operatorname{PC}(\\mathcal{H})$, yielding analogous characterizations and revealing a richer behavior when prompts are allowed. Across these results, generation and prediction are shown to be fundamentally incompatible in general, and the analysis highlights that generation is not closed under finite unions, motivating new directions for theory and applications beyond language modeling.

Abstract

We study generation through the lens of statistical learning theory. First, we abstract and formalize the results of Gold [1967], Angluin [1979], Angluin [1980] and Kleinberg and Mullainathan [2024] in terms of a binary hypothesis class defined over an abstract example space. Then, we extend the notion of "generation" from Kleinberg and Mullainathan [2024] to two new settings, we call "uniform" and "non-uniform" generation, and provide a characterization of which hypothesis classes are uniformly and non-uniformly generatable. As is standard in learning theory, our characterizations are in terms of the finiteness of a new combinatorial dimension termed the Closure dimension. By doing so, we are able to compare generatability with predictability (captured via PAC and online learnability) and show that these two properties of hypothesis classes are incompatible -- there are classes that are generatable but not predictable and vice versa. Finally, we extend our results to capture prompted generation and give a complete characterization of which classes are prompt generatable, generalizing some of the work by Kleinberg and Mullainathan [2024].

Generation through the lens of learning theory

TL;DR

This work reframes generation as a learning-theoretic problem over countable domains, introducing the Closure dimension to characterize uniform generatability and a non-uniform union-based criterion for non-uniform generatability. It proves that uniform generatability holds exactly when , while non-uniform generatability corresponds to forming a nondecreasing union of sub-classes with finite Closure dimension, and explores generatability in the limit along with EUC-based sufficiency results. The paper further extends the framework to prompted generation via the Prompted Closure dimension , yielding analogous characterizations and revealing a richer behavior when prompts are allowed. Across these results, generation and prediction are shown to be fundamentally incompatible in general, and the analysis highlights that generation is not closed under finite unions, motivating new directions for theory and applications beyond language modeling.

Abstract

We study generation through the lens of statistical learning theory. First, we abstract and formalize the results of Gold [1967], Angluin [1979], Angluin [1980] and Kleinberg and Mullainathan [2024] in terms of a binary hypothesis class defined over an abstract example space. Then, we extend the notion of "generation" from Kleinberg and Mullainathan [2024] to two new settings, we call "uniform" and "non-uniform" generation, and provide a characterization of which hypothesis classes are uniformly and non-uniformly generatable. As is standard in learning theory, our characterizations are in terms of the finiteness of a new combinatorial dimension termed the Closure dimension. By doing so, we are able to compare generatability with predictability (captured via PAC and online learnability) and show that these two properties of hypothesis classes are incompatible -- there are classes that are generatable but not predictable and vice versa. Finally, we extend our results to capture prompted generation and give a complete characterization of which classes are prompt generatable, generalizing some of the work by Kleinberg and Mullainathan [2024].

Paper Structure

This paper contains 26 sections, 28 theorems, 70 equations, 1 figure, 1 algorithm.

Key Result

Proposition 2.1

Let $\mathcal{X}$ be countable. There exists classes $\mathcal{H}_1, \mathcal{H}_2 \subseteq \{0, 1\}^{\mathcal{X}}$ satisfying the UUS property such that:

Figures (1)

  • Figure 1: Landscape of Generatability vs. Predictability for countable classes. (i-vi) map to the items in Theorem \ref{['thm:genvpred']}.

Theorems & Definitions (78)

  • Remark 2.1
  • Remark 2.2
  • Definition 2.1: Generator
  • Definition 2.2: Generatability in the Limit
  • Remark 2.3
  • Definition 2.3: Uniform Generatability
  • Definition 2.4: Uniform Generation Sample Complexity
  • Definition 2.5: Non-uniform Generatability
  • Proposition 2.1
  • Definition 2.6: Identifier
  • ...and 68 more