Table of Contents
Fetching ...

The Magic Correlations: Understanding Knowledge Transfer from Pretraining to Supervised Fine-Tuning

Simin Fan, Dimitris Paparas, Natasha Noy, Binbin Xiong, Noveen Sachdeva, Berivan Isik

TL;DR

This work probes how capabilities learned during pretraining transfer to supervised fine-tuning in large language models by introducing correlation-based protocols across 9 pretraining data mixtures, 20 benchmarks, and two model scales (240M and 1B). By examining cross-stage accuracy and confidence, intra-category coherence, and accuracy-calibration alignment, the study reveals highly category-dependent transfer, with confidence patterns often persisting beyond SFT for reasoning tasks while accuracy transfer can diverge, especially as models scale. Scaling induces inverse dynamics between accuracy and confidence transfer and shifts intra-category relationships from competition to synergy in many categories, while calibration fingerprints endure from pretraining to SFT in several domains but reorganize in others (notably NLI). The findings yield practical guidance on selecting high-transfer benchmarks, treating confidence as a complementary signal, and validating data mixtures across scales to avoid scale-dependent miscalibration. Overall, pretraining decisions leave lasting, sometimes counterintuitive, imprints on downstream behavior and calibration, underscoring the need for scale-aware data curation and evaluation strategies.

Abstract

Understanding how language model capabilities transfer from pretraining to supervised fine-tuning (SFT) is fundamental to efficient model development and data curation. In this work, we investigate four core questions: RQ1. To what extent do accuracy and confidence rankings established during pretraining persist after SFT? RQ2. Which benchmarks serve as robust cross-stage predictors and which are unreliable? RQ3. How do transfer dynamics shift with model scale? RQ4. How well does model confidence align with accuracy, as a measure of calibration quality? Does this alignment pattern transfer across training stages? We address these questions through a suite of correlation protocols applied to accuracy and confidence metrics across diverse data mixtures and model scales. Our experiments reveal that transfer reliability varies dramatically across capability categories, benchmarks, and scales -- with accuracy and confidence exhibiting distinct, sometimes opposing, scaling dynamics. These findings shed light on the complex interplay between pretraining decisions and downstream outcomes, providing actionable guidance for benchmark selection, data curation, and efficient model development.

The Magic Correlations: Understanding Knowledge Transfer from Pretraining to Supervised Fine-Tuning

TL;DR

This work probes how capabilities learned during pretraining transfer to supervised fine-tuning in large language models by introducing correlation-based protocols across 9 pretraining data mixtures, 20 benchmarks, and two model scales (240M and 1B). By examining cross-stage accuracy and confidence, intra-category coherence, and accuracy-calibration alignment, the study reveals highly category-dependent transfer, with confidence patterns often persisting beyond SFT for reasoning tasks while accuracy transfer can diverge, especially as models scale. Scaling induces inverse dynamics between accuracy and confidence transfer and shifts intra-category relationships from competition to synergy in many categories, while calibration fingerprints endure from pretraining to SFT in several domains but reorganize in others (notably NLI). The findings yield practical guidance on selecting high-transfer benchmarks, treating confidence as a complementary signal, and validating data mixtures across scales to avoid scale-dependent miscalibration. Overall, pretraining decisions leave lasting, sometimes counterintuitive, imprints on downstream behavior and calibration, underscoring the need for scale-aware data curation and evaluation strategies.

Abstract

Understanding how language model capabilities transfer from pretraining to supervised fine-tuning (SFT) is fundamental to efficient model development and data curation. In this work, we investigate four core questions: RQ1. To what extent do accuracy and confidence rankings established during pretraining persist after SFT? RQ2. Which benchmarks serve as robust cross-stage predictors and which are unreliable? RQ3. How do transfer dynamics shift with model scale? RQ4. How well does model confidence align with accuracy, as a measure of calibration quality? Does this alignment pattern transfer across training stages? We address these questions through a suite of correlation protocols applied to accuracy and confidence metrics across diverse data mixtures and model scales. Our experiments reveal that transfer reliability varies dramatically across capability categories, benchmarks, and scales -- with accuracy and confidence exhibiting distinct, sometimes opposing, scaling dynamics. These findings shed light on the complex interplay between pretraining decisions and downstream outcomes, providing actionable guidance for benchmark selection, data curation, and efficient model development.
Paper Structure (86 sections, 8 equations, 15 figures, 13 tables)

This paper contains 86 sections, 8 equations, 15 figures, 13 tables.

Figures (15)

  • Figure 1: Cross-stage correlation by capability category.(a) Accuracy correlation: the 1B model generally shows higher transferability; (b) Confidence correlation: 240M maintains substantially higher correlation especially in Commonsense (0.87 vs. 0.40) and Science (0.82 vs. 0.49) domains. This transferring pattern indicates that larger models undergo more confidence reorganization during SFT despite better accuracy preservation.
  • Figure 2: Cross-stage Correlation across various benchmarks. Each bar shows the Pearson correlation between PT and SFT performance on the certain benchmark across data mixtures. (a) Accuracy Correlation: the 1B model achieves higher transferrability than 240M (in average $\bar{r}$=$\small0.59$ v.s. $\small0.49$). (b) Confidence Correlation: the pattern reverses---240M achieves substantially stronger transfer than 1B model ($\bar{r}$=$\small0.41$ v.s. $\small0.66$). Background colors indicate capability categories (Commonsense, Science, NLI, Semantic).
  • Figure 3: Cross-stage confidence correlation (PT$\to$SFT). Each cell shows the Pearson correlation between benchmark $i$'s PT confidence and benchmark $j$'s SFT confidence across data mixtures; the diagonal represents the benchmark-wise transfer pattern. Left: At 240M, the Commonsense--Science block shows high positive correlations; Right: At 1B, greater heterogeneity emerges with negative correlations.
  • Figure 4: Within-stage confidence correlation comparison. (a) PT-PT and (b) SFT-SFT cross-benchmark confidence correlations at 240M and 1B scale. The Commonsense--Science block structure is nearly identical across stages, demonstrating that confidence correlation patterns established during pretraining persist through SFT.
  • Figure 5: Intra-category coherence across three correlation protocols.Top: the coherence scores on accuracy, where Science shows PT$\to$SFT degradation; NLI preserves cross-stage coherence despite a drop in SFT coherence. Bottom: the coherence scores on confidence, where 240M model maintains high coherence while 1B shows substantial degradation.
  • ...and 10 more figures