Table of Contents
Fetching ...

Agentic Economic Modeling

Bohan Zhang, Jiaxuan Li, Ali Hortaçsu, Xiaoyang Ye, Victor Chernozhukov, Angelo Ni, Edward W Huang

Abstract

We introduce Agentic Economic Modeling (AEM), a framework that aligns synthetic LLM choices with small-sample human evidence for reliable econometric inference. AEM first generates task-conditioned synthetic choices via LLMs, then learns a bias-correction mapping from task features and raw LLM choices to human-aligned choices, upon which standard econometric estimators perform inference to recover demand elasticities and treatment effects.We validate AEM in two experiments. In a large scale conjoint study with millions of observations, using only 10% of the original data to fit the correction model lowers the error of the demand-parameter estimates, while uncorrected LLM choices even increase the errors. In a regional field experiment, a mixture model calibrated on 10% of geographic regions estimates an out-of-domain treatment effect of -65\pm10 bps, closely matching the full human experiment (-60\pm8 bps).Under time-wise extrapolation, training with only day-one human data yields -24 bps (95% CI: [-26, -22], p<1e-5),improving over the human-only day-one baseline (-17 bps, 95% CI: [-43, +9], p=0.2049).These results demonstrate AEM's potential to improve RCT efficiency and establish a foundation method for LLM-based counterfactual generation.

Agentic Economic Modeling

Abstract

We introduce Agentic Economic Modeling (AEM), a framework that aligns synthetic LLM choices with small-sample human evidence for reliable econometric inference. AEM first generates task-conditioned synthetic choices via LLMs, then learns a bias-correction mapping from task features and raw LLM choices to human-aligned choices, upon which standard econometric estimators perform inference to recover demand elasticities and treatment effects.We validate AEM in two experiments. In a large scale conjoint study with millions of observations, using only 10% of the original data to fit the correction model lowers the error of the demand-parameter estimates, while uncorrected LLM choices even increase the errors. In a regional field experiment, a mixture model calibrated on 10% of geographic regions estimates an out-of-domain treatment effect of -65\pm10 bps, closely matching the full human experiment (-60\pm8 bps).Under time-wise extrapolation, training with only day-one human data yields -24 bps (95% CI: [-26, -22], p<1e-5),improving over the human-only day-one baseline (-17 bps, 95% CI: [-43, +9], p=0.2049).These results demonstrate AEM's potential to improve RCT efficiency and establish a foundation method for LLM-based counterfactual generation.

Paper Structure

This paper contains 50 sections, 9 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: The general overview of the missing data situation. We assume we have task features, human choices, and LLM choices for the smaller primary set. We have task features and LLM choices for the larger auxiliary set. We don’t have human choices for the auxiliary set.
  • Figure 2: Partworth for different features in the conjoint task estimated based on different choice sets.
  • Figure 3: The distribution of alignment rates between the LLM’s choices and each customer’s choices. The alignment rates vary substantially across customers.
  • Figure 4: The performance of using customer historical choices, and customer behavior summaries, with varying numbers of historical choices. The performance is generally better with more historical choices used. Personalized methods consistently perform better than non-personalized.
  • Figure 5: The treatment effect on the share of Same-Day delivery, estimated using different numbers of weeks of human experiment data