Table of Contents
Fetching ...

Large Language Models: An Applied Econometric Framework

Jens Ludwig, Sendhil Mullainathan, Ashesh Rambachan

TL;DR

The paper develops an econometric framework for integrating large language models into empirical economics by separating prediction and estimation tasks. It formalizes no training leakage as a core requirement for prediction and advocates collecting small validation samples to debias LLM-derived measurements for estimation, with theoretical and Monte Carlo demonstrations. The work provides practical guidance, empirical evidence of leakage and measurement error, and a checklist to guide researchers in responsibly using LLMs for prediction, estimation, and novel applications like hypothesis generation and human-subject simulations. Together, these contributions offer a durable, task-sensitive pathway to harness LLMs while preserving econometric rigor and inference validity. The framework emphasizes robustness to evolving architectures and training data, promoting transparency and careful research design to unlock the transformative potential of LLMs in economics.

Abstract

Large language models (LLMs) enable researchers to analyze text at unprecedented scale and minimal cost. Researchers can now revisit old questions and tackle novel ones with rich data. We provide an econometric framework for realizing this potential in two empirical uses. For prediction problems -- forecasting outcomes from text -- valid conclusions require ``no training leakage'' between the LLM's training data and the researcher's sample, which can be enforced through careful model choice and research design. For estimation problems -- automating the measurement of economic concepts for downstream analysis -- valid downstream inference requires combining LLM outputs with a small validation sample to deliver consistent and precise estimates. Absent a validation sample, researchers cannot assess possible errors in LLM outputs, and consequently seemingly innocuous choices (which model, which prompt) can produce dramatically different parameter estimates. When used appropriately, LLMs are powerful tools that can expand the frontier of empirical economics.

Large Language Models: An Applied Econometric Framework

TL;DR

The paper develops an econometric framework for integrating large language models into empirical economics by separating prediction and estimation tasks. It formalizes no training leakage as a core requirement for prediction and advocates collecting small validation samples to debias LLM-derived measurements for estimation, with theoretical and Monte Carlo demonstrations. The work provides practical guidance, empirical evidence of leakage and measurement error, and a checklist to guide researchers in responsibly using LLMs for prediction, estimation, and novel applications like hypothesis generation and human-subject simulations. Together, these contributions offer a durable, task-sensitive pathway to harness LLMs while preserving econometric rigor and inference validity. The framework emphasizes robustness to evolving architectures and training data, promoting transparency and careful research design to unlock the transformative potential of LLMs in economics.

Abstract

Large language models (LLMs) enable researchers to analyze text at unprecedented scale and minimal cost. Researchers can now revisit old questions and tackle novel ones with rich data. We provide an econometric framework for realizing this potential in two empirical uses. For prediction problems -- forecasting outcomes from text -- valid conclusions require ``no training leakage'' between the LLM's training data and the researcher's sample, which can be enforced through careful model choice and research design. For estimation problems -- automating the measurement of economic concepts for downstream analysis -- valid downstream inference requires combining LLM outputs with a small validation sample to deliver consistent and precise estimates. Absent a validation sample, researchers cannot assess possible errors in LLM outputs, and consequently seemingly innocuous choices (which model, which prompt) can produce dramatically different parameter estimates. When used appropriately, LLMs are powerful tools that can expand the frontier of empirical economics.

Paper Structure

This paper contains 65 sections, 6 theorems, 63 equations, 25 figures, 16 tables.

Key Result

Lemma 1

Under Assumption asm: research contexts, for any research context $Q(\cdot) \in \mathcal{Q}$ and text generator $\widehat{m}(\cdot)$,

Figures (25)

  • Figure 1: Variation in t-statistics for realized returns across large language models and prompting strategies on financial news headlines.
  • Figure 2: Variation in t-statistics across large language models and prompting strategies on congressional legislation.
  • Figure 3: Normalized bias of the plug-in regression and bias-corrected regression across Monte Carlo simulations based on congressional legislation.
  • Figure 4: Cumulative distribution function of mean square error for the bias-corrected estimator against validation-sample only estimator.
  • Figure A1: Two examples of GPT-4o completions that exactly match original descriptions of congressional legislation.
  • ...and 20 more figures

Theorems & Definitions (8)

  • Definition 1: Prediction problem
  • Lemma 1
  • Proposition 1
  • Definition 2
  • Lemma 2
  • Lemma 3
  • Proposition 2
  • Proposition 3