Table of Contents
Fetching ...

Behavioral Economics of AI: LLM Biases and Corrections

Pietro Bini, Lin William Cong, Xing Huang, Lawrence J. Jin

TL;DR

This study investigates whether large language models exhibit systematic behavioral biases in economic decisions and whether such biases can be mitigated. By merging cognitive psychology prompts (preferences and beliefs) with experimental-economics tasks across twelve LLMs from four families, the authors map bias patterns across model generations, scales, and architectures, and test debiasing methods. They find a robust divide: larger models tend to imitate human-like biases in preference-based questions but respond more rationally to belief-based tasks, with substantial cross-family heterogeneity. A simple role-priming prompt—asking LLMs to act as rational Expected Utility decision-makers—offers modest bias reduction, while more elaborate information-enrichment debiasing can be ineffective or counterproductive. The work provides a public benchmarking dataset and highlights the challenges and opportunities in using LLMs as tools for economic research and decision support, underscoring the need for careful evaluation of AI agents in financial contexts. Throughout, hypotheses link observed patterns to training and architecture, including RLHF alignment for preferences and larger data/compute for belief formation, inviting future work on robust debiasing and safe AI deployment in economics.

Abstract

Do generative AI models, particularly large language models (LLMs), exhibit systematic behavioral biases in economic and financial decisions? If so, how can these biases be mitigated? Drawing on the cognitive psychology and experimental economics literatures, we conduct the most comprehensive set of experiments to date$-$originally designed to document human biases$-$on prominent LLM families across model versions and scales. We document systematic patterns in LLM behavior. In preference-based tasks, responses become more human-like as models become more advanced or larger, while in belief-based tasks, advanced large-scale models frequently generate rational responses. Prompting LLMs to make rational decisions reduces biases.

Behavioral Economics of AI: LLM Biases and Corrections

TL;DR

This study investigates whether large language models exhibit systematic behavioral biases in economic decisions and whether such biases can be mitigated. By merging cognitive psychology prompts (preferences and beliefs) with experimental-economics tasks across twelve LLMs from four families, the authors map bias patterns across model generations, scales, and architectures, and test debiasing methods. They find a robust divide: larger models tend to imitate human-like biases in preference-based questions but respond more rationally to belief-based tasks, with substantial cross-family heterogeneity. A simple role-priming prompt—asking LLMs to act as rational Expected Utility decision-makers—offers modest bias reduction, while more elaborate information-enrichment debiasing can be ineffective or counterproductive. The work provides a public benchmarking dataset and highlights the challenges and opportunities in using LLMs as tools for economic research and decision support, underscoring the need for careful evaluation of AI agents in financial contexts. Throughout, hypotheses link observed patterns to training and architecture, including RLHF alignment for preferences and larger data/compute for belief formation, inviting future work on robust debiasing and safe AI deployment in economics.

Abstract

Do generative AI models, particularly large language models (LLMs), exhibit systematic behavioral biases in economic and financial decisions? If so, how can these biases be mitigated? Drawing on the cognitive psychology and experimental economics literatures, we conduct the most comprehensive set of experiments to dateoriginally designed to document human biaseson prominent LLM families across model versions and scales. We document systematic patterns in LLM behavior. In preference-based tasks, responses become more human-like as models become more advanced or larger, while in belief-based tasks, advanced large-scale models frequently generate rational responses. Prompting LLMs to make rational decisions reduces biases.
Paper Structure (15 sections, 7 equations, 23 figures, 9 tables)

This paper contains 15 sections, 7 equations, 23 figures, 9 tables.

Figures (23)

  • Figure 1: Example of prompt: Diminishing sensitivity of prospect theory.
  • Figure 2: Proportion of LLM responses: Advanced large-scale models.
  • Figure 3: Heterogeneity in LLM responses across model generations and model scales.
  • Figure 4: LLM forecasts: Experiment 1 in Afrouzi2023.
  • Figure 5: LLM forecasts: Experiments 2 and 3 in Afrouzi2023.
  • ...and 18 more figures