Table of Contents
Fetching ...

Weak-to-Strong Generalization Through the Data-Centric Lens

Changho Shin, John Cooper, Frederic Sala

TL;DR

This work proposes overlap density as a data-centric mechanism to explain weak-to-strong generalization, arguing that points containing both easy and hard patterns enable a strong model to learn hard patterns via supervision from a weaker model. It formalizes the mechanism, provides a theoretical expansion-based bound, and develops practical tools for overlap detection and data-source selection under budget constraints. Empirically, it validates the mechanism across large-language-model setups, weak supervision, and synthetic Gaussian-mixture experiments, showing that higher overlap density correlates with stronger generalization and that UCB-based data sourcing can maximize this effect. The results highlight a data-centric pathway to improve data efficiency in weak-to-strong learning and point to future work on richer pattern structures and more robust detection methods.

Abstract

The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While decades of research have resulted in numerous algorithms that produce strong empirical performance, understanding what aspects of data enable weak-to-strong generalization has been understudied. We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density. Intuitively, generalization tracks the number of points that contain overlaps, i.e., both easy patterns (learnable by a weak model) and challenging patterns (only learnable by a stronger model), as with such points, weak predictions can be used to learn challenging patterns by stronger models. We provide a practical overlap detection algorithm to find such points in datasets and leverage them to learn, among multiple sources of data, which to query when seeking to maximize overlap density and thereby enhance weak-to-strong generalization. We present a theoretical result showing that the generalization benefit is a function of the overlap density and a regret bound for our data selection algorithm. Empirically, we validate the mechanism and the overlap detection algorithm on a wide array of settings.

Weak-to-Strong Generalization Through the Data-Centric Lens

TL;DR

This work proposes overlap density as a data-centric mechanism to explain weak-to-strong generalization, arguing that points containing both easy and hard patterns enable a strong model to learn hard patterns via supervision from a weaker model. It formalizes the mechanism, provides a theoretical expansion-based bound, and develops practical tools for overlap detection and data-source selection under budget constraints. Empirically, it validates the mechanism across large-language-model setups, weak supervision, and synthetic Gaussian-mixture experiments, showing that higher overlap density correlates with stronger generalization and that UCB-based data sourcing can maximize this effect. The results highlight a data-centric pathway to improve data efficiency in weak-to-strong learning and point to future work on richer pattern structures and more robust detection methods.

Abstract

The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While decades of research have resulted in numerous algorithms that produce strong empirical performance, understanding what aspects of data enable weak-to-strong generalization has been understudied. We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density. Intuitively, generalization tracks the number of points that contain overlaps, i.e., both easy patterns (learnable by a weak model) and challenging patterns (only learnable by a stronger model), as with such points, weak predictions can be used to learn challenging patterns by stronger models. We provide a practical overlap detection algorithm to find such points in datasets and leverage them to learn, among multiple sources of data, which to query when seeking to maximize overlap density and thereby enhance weak-to-strong generalization. We present a theoretical result showing that the generalization benefit is a function of the overlap density and a regret bound for our data selection algorithm. Empirically, we validate the mechanism and the overlap detection algorithm on a wide array of settings.

Paper Structure

This paper contains 64 sections, 12 theorems, 67 equations, 20 figures, 3 tables, 3 algorithms.

Key Result

Theorem 4.1

Suppose $\mathbb{P}$ satisfies $(c,q)$ expansion on $(S^{\text{bad}}_i\cap D_{\text{hard only}}, S^{\text{good}}_i\cap D_{\text{overlap}})$ for some $c > 0$. Consider an arbitrary $\eta$-robust classifier $f_{\text{w2s}}$ such that $\mathbb{P}(f_{\text{w2s}}({\mathbf{x}})\neq f_{\text{weak}}({\mathb

Figures (20)

  • Figure 1: Left: overlapping easy and hard patterns in our dataset are the key to weak-to-strong generalization. Learning from overlapping points, where easy features and hard features coexist, enables a weak-to-strong model $f_{\text{w2s}}$ that can generalize, while $f_{\text{weak}}$ is limited to reliably predicting points with easy patterns. Right: adding more such overlapping points has little influence on the performance of the weak model, but dramatically improves the performance of the weak-to-strong model. Adding such points---even a small percentage of the dataset---can push against the limits of the strong model.
  • Figure 2: Overlap density versus performance in weak-to-strong generalization with LLMs. Red lines show strong ceiling model accuracies, blue dashed lines represent weak model test accuracies, and W2S lines represent the accuracies of strong models trained on pseudolabeled data with a controlled proportion of overlap density. In general, the strong model's improvement over the weak model tracks the overlap proportion, suggesting that the overlap density is indeed an important mechanism for generalization. We can observe three different regimes of weak-to-strong generalization in our experiments: a low overlap regime, where the overlap density is insufficient for effective weak-to-strong generalization (here, few points contain overlaps, so choosing to rely on a large overlap proportion translates to a small train set), a medium overlap regime, where the overlap density improves generalization but still yields performance close to that of the weak model, and a high-overlap regime, where the strong model's performance approaches that of the true strong model due to sufficient overlap points.
  • Figure 3: Data selection results with Algorithm \ref{['alg:data_selection']} for Amazon Polarity and DREAM datasets. We report the average of 20 repeated experiments with different seeds. We observe that the data source selection procedure, based on overlap density estimation, can produce enhancements over random sampling across data sources.
  • Figure 4: Accuracy in each data region in synthetic experiments. As expected, the performance gain mainly comes from hard data points as the overlap density increases.
  • Figure 5: Synthetic data selection experiment. Our algorithm demonstrates better data efficiency than random sampling by consistently identifying the data source with the highest overlap density.
  • ...and 15 more figures

Theorems & Definitions (27)

  • Definition 1: Expansion
  • Definition 2: $\eta$-robust
  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Definition 3: Neighborhood
  • Definition 4: Example graph
  • Definition 5: $\eta$-robust neighborhood size
  • Definition 6: Expansion
  • Definition 7: Expansion of a set collection
  • ...and 17 more