Table of Contents
Fetching ...

From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning

Junsoo Oh, Jerry Song, Chulhee Yun

TL;DR

The paper studies provable weak-to-strong generalization when a stronger two-layer ReLU CNN is trained under supervision from a pretrained linear CNN on patch-based data with signals and noise. It identifies two regimes—data-scarce and data-abundant—each with distinct mechanisms: benign overfitting or harmful overfitting in the scarce regime, and early label-correction with potential overtraining in the abundant regime. The authors provide concrete theorems detailing convergence, generalization bounds, and phase transitions, along with signal-noise decomposition analyses to explain the dynamics. Experiments on a synthetic theoretical setting and MNIST-modified data corroborate the theory, highlighting the practical role of early stopping and data selection in achieving robust weak-to-strong gains.

Abstract

Weak-to-strong generalization refers to the phenomenon where a stronger model trained under supervision from a weaker one can outperform its teacher. While prior studies aim to explain this effect, most theoretical insights are limited to abstract frameworks or linear/random feature models. In this paper, we provide a formal analysis of weak-to-strong generalization from a linear CNN (weak) to a two-layer ReLU CNN (strong). We consider structured data composed of label-dependent signals of varying difficulty and label-independent noise, and analyze gradient descent dynamics when the strong model is trained on data labeled by the pretrained weak model. Our analysis identifies two regimes -- data-scarce and data-abundant -- based on the signal-to-noise characteristics of the dataset, and reveals distinct mechanisms of weak-to-strong generalization. In the data-scarce regime, generalization occurs via benign overfitting or fails via harmful overfitting, depending on the amount of data, and we characterize the transition boundary. In the data-abundant regime, generalization emerges in the early phase through label correction, but we observe that overtraining can subsequently degrade performance.

From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning

TL;DR

The paper studies provable weak-to-strong generalization when a stronger two-layer ReLU CNN is trained under supervision from a pretrained linear CNN on patch-based data with signals and noise. It identifies two regimes—data-scarce and data-abundant—each with distinct mechanisms: benign overfitting or harmful overfitting in the scarce regime, and early label-correction with potential overtraining in the abundant regime. The authors provide concrete theorems detailing convergence, generalization bounds, and phase transitions, along with signal-noise decomposition analyses to explain the dynamics. Experiments on a synthetic theoretical setting and MNIST-modified data corroborate the theory, highlighting the practical role of early stopping and data selection in achieving robust weak-to-strong gains.

Abstract

Weak-to-strong generalization refers to the phenomenon where a stronger model trained under supervision from a weaker one can outperform its teacher. While prior studies aim to explain this effect, most theoretical insights are limited to abstract frameworks or linear/random feature models. In this paper, we provide a formal analysis of weak-to-strong generalization from a linear CNN (weak) to a two-layer ReLU CNN (strong). We consider structured data composed of label-dependent signals of varying difficulty and label-independent noise, and analyze gradient descent dynamics when the strong model is trained on data labeled by the pretrained weak model. Our analysis identifies two regimes -- data-scarce and data-abundant -- based on the signal-to-noise characteristics of the dataset, and reveals distinct mechanisms of weak-to-strong generalization. In the data-scarce regime, generalization occurs via benign overfitting or fails via harmful overfitting, depending on the amount of data, and we characterize the transition boundary. In the data-abundant regime, generalization emerges in the early phase through label correction, but we observe that overtraining can subsequently degrade performance.

Paper Structure

This paper contains 45 sections, 23 theorems, 235 equations, 2 figures, 1 table.

Key Result

Proposition 2.1

Let $({\bm{X}}, y) \sim {\mathcal{D}}$ be a test example. For any weak model $f_\mathrm{wk}({\bm{w}}, \cdot)$, it satisfies $\mathbb{P}[ y f_\mathrm{wk}({\bm{w}}, {\bm{X}}) <0 \mid ({\bm{X}},y) \in {\mathcal{S}}_\mathrm{h} ] = \frac{1}{2}.$

Figures (2)

  • Figure 1: Weak-to-strong training with varying training dataset sizes ($n_\mathrm{st}$). These align with our theoretical findings: (a) harmful overfitting for $n_\mathrm{st}=75$; (b) benign overfitting for $n_\mathrm{st}=2000$; and (c) for $n_\mathrm{st}=20000$, an early emergence of generalization and degradation with overtraining.
  • Figure 2: Examples of the modified MNIST.

Theorems & Definitions (42)

  • Definition 1
  • Definition 2: Weak Model
  • Proposition 2.1
  • proof
  • Definition 3: Strong Model
  • Proposition 2.2
  • proof
  • Theorem 3.3: Weak Model Training
  • Theorem 3.4: Weak-to-Strong Training, Data-Scarce Regime
  • Theorem 3.6: Weak-to-Strong Training, Data-Abundant Regime
  • ...and 32 more