Large Language Models for Behavioral Economics: Internal Validity and Elicitation of Mental Models
Brian Jabarian
TL;DR
The paper addresses internal validity in behavioral and experimental economics and the challenge of enforcing exclusion restrictions while eliciting accurate mental models. It advocates integrating Large Language Models (LLMs) into AI-assisted experimental design to optimize observability, compliance, SUTVA, and independence, alongside methods to elicit and measure mental models. A central contribution is a case study demonstrating how AI-driven design, storytelling environments, AI-based grading, and JS-based data quality checks can improve engagement, incentive compatibility, and measurement validity in lab-in-field setups. The work argues that AI-enabled approaches enhance rigor, transparency, and reproducibility, enabling richer behavioral insights and scalable, more reliable experimentation.
Abstract
In this article, we explore the transformative potential of integrating generative AI, particularly Large Language Models (LLMs), into behavioral and experimental economics to enhance internal validity. By leveraging AI tools, researchers can improve adherence to key exclusion restrictions and in particular ensure the internal validity measures of mental models, which often require human intervention in the incentive mechanism. We present a case study demonstrating how LLMs can enhance experimental design, participant engagement, and the validity of measuring mental models.
