Table of Contents
Fetching ...

Theoretical Analysis of Weak-to-Strong Generalization

Hunter Lang, David Sontag, Aravindan Vijayaraghavan

TL;DR

The paper studies how a strong model trained with weak, incomplete, or imperfect pseudolabels can exhibit weak-to-strong generalization, including correcting the teacher’s errors and generalizing to areas without labels. It introduces expansion-based bounds that explicitly capture pseudolabel correction and coverage expansion, and extends them to average-case robustness via robust expansion. A key contribution is formal definitions of expanding families and a statistical framework to check expansion from finite data, enabling empirical validation. The authors connect their results to co-training, self-training, and domain adaptation literature, and demonstrate with experiments that expansion properties can be observed in practice and align with bound behavior. Overall, the work provides a theoretical and empirical framework to understand when weak supervision yields strong generalization and how to verify it in real data settings.

Abstract

Strong student models can learn from weaker teachers: when trained on the predictions of a weaker model, a strong pretrained student can learn to correct the weak model's errors and generalize to examples where the teacher is not confident, even when these examples are excluded from training. This enables learning from cheap, incomplete, and possibly incorrect label information, such as coarse logical rules or the generations of a language model. We show that existing weak supervision theory fails to account for both of these effects, which we call pseudolabel correction and coverage expansion, respectively. We give a new bound based on expansion properties of the data distribution and student hypothesis class that directly accounts for pseudolabel correction and coverage expansion. Our bounds capture the intuition that weak-to-strong generalization occurs when the strong model is unable to fit the mistakes of the weak teacher without incurring additional error. We show that these expansion properties can be checked from finite data and give empirical evidence that they hold in practice.

Theoretical Analysis of Weak-to-Strong Generalization

TL;DR

The paper studies how a strong model trained with weak, incomplete, or imperfect pseudolabels can exhibit weak-to-strong generalization, including correcting the teacher’s errors and generalizing to areas without labels. It introduces expansion-based bounds that explicitly capture pseudolabel correction and coverage expansion, and extends them to average-case robustness via robust expansion. A key contribution is formal definitions of expanding families and a statistical framework to check expansion from finite data, enabling empirical validation. The authors connect their results to co-training, self-training, and domain adaptation literature, and demonstrate with experiments that expansion properties can be observed in practice and align with bound behavior. Overall, the work provides a theoretical and empirical framework to understand when weak supervision yields strong generalization and how to verify it in real data settings.

Abstract

Strong student models can learn from weaker teachers: when trained on the predictions of a weaker model, a strong pretrained student can learn to correct the weak model's errors and generalize to examples where the teacher is not confident, even when these examples are excluded from training. This enables learning from cheap, incomplete, and possibly incorrect label information, such as coarse logical rules or the generations of a language model. We show that existing weak supervision theory fails to account for both of these effects, which we call pseudolabel correction and coverage expansion, respectively. We give a new bound based on expansion properties of the data distribution and student hypothesis class that directly accounts for pseudolabel correction and coverage expansion. Our bounds capture the intuition that weak-to-strong generalization occurs when the strong model is unable to fit the mistakes of the weak teacher without incurring additional error. We show that these expansion properties can be checked from finite data and give empirical evidence that they hold in practice.
Paper Structure (52 sections, 28 theorems, 129 equations, 3 figures, 4 tables)

This paper contains 52 sections, 28 theorems, 129 equations, 3 figures, 4 tables.

Key Result

Proposition 3.1

Suppose the label marginals for the above example satisfy $\mathbb{P}({\textnormal{y}}=y) = \frac{1}{2}$ for $y\in\{-1,1\}$, and assume that the weak label error rates $\alpha_{-1} = \alpha_1 = \alpha$, and that the weak labels cover each class equally often: $\mathbb{P}({\tilde{y}} = \varnothing |

Figures (3)

  • Figure 1: Relative expansion (Definition \ref{['def:relative-expansion']}) on the sets $(A,B)$. Expansion requires that certain subsets $U\subset B$ have neighborhoods $\mathcal{N}(U)$ such that $\mathbb{P}(\mathcal{N}(U)|A) \ge c\mathbb{P}(U|B)$. These probabilities are represented graphically on the right-hand-side as the fractions $|\mathcal{N}(U)\cap A| / |A|$ and $|U| / |B|$.
  • Figure 2: Examples of good (left) and bad (right) robust expansion. In both cases, there is a core subset $V \subset \mathcal{N}(U)$ that accounts for most of the edge weight incident on $U$ (at least a $1-\eta$ fraction, for some small $\eta$). The robust expansion is good when every such subset has large probability.
  • Figure 3: Example of our neighborhood oracle $n$, constructed using a targeted paraphrase procedure. For a covered point ${\bm{x}} \in S$ (in this case, ${\bm{x}} \in S_0^{bad}$, since it is a true negative point mislabeled by our example weak rules ${\tilde{y}}$), we first generate an uncovered point ${\bm{x}}'\in T_i$ using a constrained paraphrase model and rejection sampling to ensure the ground-truth label remains negative (we use a model trained on the gold labels as a stand-in for the ground truth $y$). Next, we use GPT-4 to rewrite ${\bm{x}}'$ using the opposite word from {horrible, incredible} than the one that originally appeared. This generates another point ${\bm{x}}" \in S_i$. Since we enforce that ${\bm{x}}$ and ${\bm{x}}"$ are covered by different words, we know that if ${\bm{x}} \in S_i^{good}$ (resp. $S_i^{bad}$), ${\bm{x}}" \in S_i^{bad}$ (resp. $S_i^{good}$).

Theorems & Definitions (57)

  • Proposition 3.1
  • Proposition 3.2: informal
  • Definition 1: Neighborhood
  • Definition 2: $\eta$-robust
  • Definition 3: Expansion
  • Definition 4: Expansion of a set collection
  • Theorem 4.1: Pseudolabel correction
  • Theorem 4.2: Error bound for uncovered points
  • Definition 5: Example graph
  • Definition 6: $\eta$-robust neighborhood size
  • ...and 47 more