Is there "Secret Sauce'' in Large Language Model Development?

Matthias Mertens; Natalia Fischl-Lanzoni; Neil Thompson

Is there "Secret Sauce'' in Large Language Model Development?

Matthias Mertens, Natalia Fischl-Lanzoni, Neil Thompson

TL;DR

The paper investigates whether frontier LLM progress is driven by scale or proprietary techniques, using a dataset of $MMLU-Pro$ scores and training compute for 809 models released between 2022 and 2025. It decomposes observed performance differences into four components—scaling effects from compute, shared algorithmic progress, developer-specific efficiency (the secret sauce), and model-specific factors—via a regression on $logit$-transformed scores. The key findings show frontier improvements are predominantly explained by scaling (roughly $80-90\%$ of performance), but algorithmic progress and developer-specific efficiency contribute meaningfully, with substantial dispersion in compute efficiency both across and within firms. Algorithmic progress yields large efficiency gains enabling much smaller models to reach fixed scores (up to about $8{,}000\times$ compute reductions when including smaller developers), suggesting that efficiency improvements can democratize capabilities while reducing costs, though diffusion may create rents for firms with proprietary techniques. Overall, the results imply sustained AI leadership depends on access to expanding compute, while efficiency gains diffuse differently across models and firms, influencing future frontier progress and price dynamics.

Abstract

Do leading LLM developers possess a proprietary ``secret sauce'', or is LLM performance driven by scaling up compute? Using training and benchmark data for 809 models released between 2022 and 2025, we estimate scaling-law regressions with release-date and developer fixed effects. We find clear evidence of developer-specific efficiency advantages, but their importance depends on where models lie in the performance distribution. At the frontier, 80-90% of performance differences are explained by higher training compute, implying that scale--not proprietary technology--drives frontier advances. Away from the frontier, however, proprietary techniques and shared algorithmic progress substantially reduce the compute required to reach fixed capability thresholds. Some companies can systematically produce smaller models more efficiently. Strikingly, we also find substantial variation of model efficiency within companies; a firm can train two models with more than 40x compute efficiency difference. We also discuss the implications for AI leadership and capability diffusion.

Is there "Secret Sauce'' in Large Language Model Development?

TL;DR

The paper investigates whether frontier LLM progress is driven by scale or proprietary techniques, using a dataset of

scores and training compute for 809 models released between 2022 and 2025. It decomposes observed performance differences into four components—scaling effects from compute, shared algorithmic progress, developer-specific efficiency (the secret sauce), and model-specific factors—via a regression on

-transformed scores. The key findings show frontier improvements are predominantly explained by scaling (roughly

of performance), but algorithmic progress and developer-specific efficiency contribute meaningfully, with substantial dispersion in compute efficiency both across and within firms. Algorithmic progress yields large efficiency gains enabling much smaller models to reach fixed scores (up to about

compute reductions when including smaller developers), suggesting that efficiency improvements can democratize capabilities while reducing costs, though diffusion may create rents for firms with proprietary techniques. Overall, the results imply sustained AI leadership depends on access to expanding compute, while efficiency gains diffuse differently across models and firms, influencing future frontier progress and price dynamics.

Abstract

Paper Structure (32 sections, 4 equations, 20 figures, 3 tables)

This paper contains 32 sections, 4 equations, 20 figures, 3 tables.

Related work.
Results
Regression Analysis
Results: Variance Decomposition.
Results: scaling, shared progress, secret sauce, and model-specific factors.
LLM-development.
Decomposing improvements at the frontier.
How technical progress gives rise to small efficient models.
Specialized Capabilities: MATH Level 5
Discussion
Methods
Data
Regression framework.
Implied compute factors.
Robustness.
...and 17 more sections

Figures (20)

Figure 1: Shapley Variance Decomposition, Different Samples
Figure 2: Main Results
Figure 3: Contributions to Top Model Over Time
Figure 4: Sources of Performance Growth: Frontier Models and Smaller, Efficient Models
Figure C.1: MMLU-Pro Score vs Log(FLOPs) Data Visualization with a Logistic Curve Fit
...and 15 more figures

Is there "Secret Sauce'' in Large Language Model Development?

TL;DR

Abstract

Is there "Secret Sauce'' in Large Language Model Development?

Authors

TL;DR

Abstract

Table of Contents

Figures (20)