Table of Contents
Fetching ...

How Many Domains Suffice for Domain Generalization? A Tight Characterization via the Domain Shattering Dimension

Cynthia Dwork, Lunjia Hu, Han Shao

TL;DR

This work provides a tight theoretical characterization of domain generalization by introducing the domain shattering dimension $\mathrm{Gdim}(\mathcal{H},\mathcal{G},\tau,\alpha)$. Using a min-max ERM approach and a uniform-convergence analysis for partial concept classes, the authors derive an upper bound on domain sample complexity that scales with $\mathrm{Gdim}$, and complement it with a near-matching lower bound. They establish a precise relationship between $\mathrm{Gdim}$ and the classic VC dimension, showing $\mathrm{Gdim}=O(d\log(1/\alpha))$ and $\mathrm{Gdim}=\Omega(d\log(1/\alpha))$ in worst-case scenarios, thus linking domain generalization to standard PAC learnability while highlighting potential improvements in domain-sample efficiency. The work also connects to domain adaptation via refined divergences and covering numbers, illustrating how domain similarity can reduce the required number of observed domains. Overall, the results advance the theoretical foundations of learning across unseen domains with minimal assumptions about domain relationships, and provide a framework applicable to binary, multi-class, and regression settings.

Abstract

We study a fundamental question of domain generalization: given a family of domains (i.e., data distributions), how many randomly sampled domains do we need to collect data from in order to learn a model that performs reasonably well on every seen and unseen domain in the family? We model this problem in the PAC framework and introduce a new combinatorial measure, which we call the domain shattering dimension. We show that this dimension characterizes the domain sample complexity. Furthermore, we establish a tight quantitative relationship between the domain shattering dimension and the classic VC dimension, demonstrating that every hypothesis class that is learnable in the standard PAC setting is also learnable in our setting.

How Many Domains Suffice for Domain Generalization? A Tight Characterization via the Domain Shattering Dimension

TL;DR

This work provides a tight theoretical characterization of domain generalization by introducing the domain shattering dimension . Using a min-max ERM approach and a uniform-convergence analysis for partial concept classes, the authors derive an upper bound on domain sample complexity that scales with , and complement it with a near-matching lower bound. They establish a precise relationship between and the classic VC dimension, showing and in worst-case scenarios, thus linking domain generalization to standard PAC learnability while highlighting potential improvements in domain-sample efficiency. The work also connects to domain adaptation via refined divergences and covering numbers, illustrating how domain similarity can reduce the required number of observed domains. Overall, the results advance the theoretical foundations of learning across unseen domains with minimal assumptions about domain relationships, and provide a framework applicable to binary, multi-class, and regression settings.

Abstract

We study a fundamental question of domain generalization: given a family of domains (i.e., data distributions), how many randomly sampled domains do we need to collect data from in order to learn a model that performs reasonably well on every seen and unseen domain in the family? We model this problem in the PAC framework and introduce a new combinatorial measure, which we call the domain shattering dimension. We show that this dimension characterizes the domain sample complexity. Furthermore, we establish a tight quantitative relationship between the domain shattering dimension and the classic VC dimension, demonstrating that every hypothesis class that is learnable in the standard PAC setting is also learnable in our setting.

Paper Structure

This paper contains 14 sections, 11 theorems, 78 equations.

Key Result

Theorem 4.1

Let $\mathcal{H}$ be a class of hypotheses $h:X\to \{0,1\}$ and let $\mathcal{G}$ be a family of domains $\mathcal{D}$ each being a distribution over $X\times \{0,1\}$. Define $d:=\mathrm{Gdim}(\mathcal{H}, \mathcal{G}, \tau, \alpha)$ for $\tau,\alpha\in [0,1]$. For every $\varepsilon,\delta\in (0,1

Theorems & Definitions (28)

  • Definition 2.1: VC dimension of partial concept classes vcalon2022theory
  • Definition 3.1: Optimal domain error bound and optimal hypothesis
  • Definition 3.2: $(\tau, \alpha, \gamma,\delta)$-domain learnability
  • Definition 4.1: Domain shattering dimension
  • Theorem 4.1
  • Lemma 4.1: Uniform convergence for partial concepts
  • Theorem 4.2: Quasipolynomial Sauer-Shelah-Perles Lemma for Disambiguations of Partial Concepts alon2022theory
  • proof : Proof of \ref{['lm:uc-partial']}
  • Remark 4.1: Choice of the threshold $\tau$
  • Remark 4.2: Beyond binary classification
  • ...and 18 more