Algorithmic progress in language models
Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla
TL;DR
Addressing how much language-model pre-training progress comes from algorithmic innovations versus scaling up compute and data, the paper constructs a dataset of over 200 evaluations on WikiText benchmarks and fits an augmented scaling-law model with effective compute. It demonstrates that the compute needed to reach a fixed performance halves about every 8–9 months, driven primarily by compute scaling, with algorithmic progress contributing a smaller share. The transformer architecture yields substantial compute-equivalent gains, estimated around 7.2×, but the overall gains remain dominated by scaling compute budgets. The work highlights the value and limits of current scaling laws for forecasting future progress and informs how researchers allocate compute and algorithmic research.
Abstract
We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months, substantially faster than hardware gains per Moore's Law. We estimate augmented scaling laws, which enable us to quantify algorithmic progress and determine the relative contributions of scaling models versus innovations in training algorithms. Despite the rapid pace of algorithmic progress and the development of new architectures such as the transformer, our analysis reveals that the increase in compute made an even larger contribution to overall performance improvements over this time period. Though limited by noisy benchmark data, our analysis quantifies the rapid progress in language modeling, shedding light on the relative contributions from compute and algorithms.
