Table of Contents
Fetching ...

Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Translation

Ali Merali

TL;DR

The study probes how scaling laws for large language models translate into real-world economic productivity. It implements a preregistered randomized controlled trial with 300 translators, 1,800 tasks, and 13 LLMs of varying compute, measuring time, quality, and earnings under bonuses. Key findings show that a tenfold increase in training compute reduces task time by 12.3%, elevates quality by 0.18 standard deviations, and raises earnings per minute by 16.1%, with larger gains for lower-skilled workers. By embedding these scaling results in the Acemoglu (2024) framework and using Hulten's theorem, the authors project about 6.95% aggregate U.S. productivity growth over the next decade, while noting domain limits and the need for broader generalization and longer-horizon analysis.

Abstract

This paper derives "scaling laws"--empirical relationships between the training compute of Large Language Models (LLMs) and their performance--for economic outcomes. In a preregistered online experiment, 300 professional translators completed 1,800 tasks using one of 13 LLMs (or a control). A tenfold increase in model compute improved task completion speed by 12.3%, grades by 0.18 standard deviations, and earnings per minute by 16.1%. Gains were four times larger for lower-skilled workers. These findings suggest continued model scaling could boost U.S. productivity by at least 6.9% over the next decade.

Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Translation

TL;DR

The study probes how scaling laws for large language models translate into real-world economic productivity. It implements a preregistered randomized controlled trial with 300 translators, 1,800 tasks, and 13 LLMs of varying compute, measuring time, quality, and earnings under bonuses. Key findings show that a tenfold increase in training compute reduces task time by 12.3%, elevates quality by 0.18 standard deviations, and raises earnings per minute by 16.1%, with larger gains for lower-skilled workers. By embedding these scaling results in the Acemoglu (2024) framework and using Hulten's theorem, the authors project about 6.95% aggregate U.S. productivity growth over the next decade, while noting domain limits and the need for broader generalization and longer-horizon analysis.

Abstract

This paper derives "scaling laws"--empirical relationships between the training compute of Large Language Models (LLMs) and their performance--for economic outcomes. In a preregistered online experiment, 300 professional translators completed 1,800 tasks using one of 13 LLMs (or a control). A tenfold increase in model compute improved task completion speed by 12.3%, grades by 0.18 standard deviations, and earnings per minute by 16.1%. Gains were four times larger for lower-skilled workers. These findings suggest continued model scaling could boost U.S. productivity by at least 6.9% over the next decade.
Paper Structure (7 sections, 4 figures, 5 tables)

This paper contains 7 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Impact on Time Taken and Quality of any AI Model Usage
  • Figure 2: Time Taken as a Function of Model Training Compute (Log Scale)
  • Figure 3: Mean Grade as a Function of Model Training Compute (Log Scale)
  • Figure 4: Productivity as a Function of Model Training Compute (Log Scale)