Table of Contents
Fetching ...

LLM-driven Imitation of Subrational Behavior : Illusion or Reality?

Andrea Coletta, Kshama Dwarakanath, Penghang Liu, Svitlana Vyetrenko, Tucker Balch

TL;DR

The paper tackles the challenge of modeling subrational human decision making by leveraging Large Language Models (LLMs) to synthesize demonstrations that feed Imitation Learning (IL). By framing decisions as Markov Decision Processes and generating synthetic state–action data with prompts (often via chain-of-thought or structured summaries), the authors train subrational policies without requiring complex reward engineering or extensive human data. Across four classic tasks—the Ultimatum Game, the Stanford Marshmallow Experiment, a Double or Nothing Gamble, and Academic Procrastination—the IL policies derived from LLM demonstrations replicate established human-like patterns (e.g., rejection of unfair offers, delayed gratification in children, risk biases, and present-bias procrastination). The findings suggest LLM-driven demonstrations can provide scalable, cost-effective probes of subrational behavior, though they acknowledge limitations related to prompt sensitivity, numerical reasoning, and potential biases, and emphasize careful evaluation against real human data. Overall, the work proposes a new paradigm for calibrating subrational models and highlights both the opportunities and challenges of using foundation models for behavioral science and agent-based simulations.

Abstract

Modeling subrational agents, such as humans or economic households, is inherently challenging due to the difficulty in calibrating reinforcement learning models or collecting data that involves human subjects. Existing work highlights the ability of Large Language Models (LLMs) to address complex reasoning tasks and mimic human communication, while simulation using LLMs as agents shows emergent social behaviors, potentially improving our comprehension of human conduct. In this paper, we propose to investigate the use of LLMs to generate synthetic human demonstrations, which are then used to learn subrational agent policies though Imitation Learning. We make an assumption that LLMs can be used as implicit computational models of humans, and propose a framework to use synthetic demonstrations derived from LLMs to model subrational behaviors that are characteristic of humans (e.g., myopic behavior or preference for risk aversion). We experimentally evaluate the ability of our framework to model sub-rationality through four simple scenarios, including the well-researched ultimatum game and marshmallow experiment. To gain confidence in our framework, we are able to replicate well-established findings from prior human studies associated with the above scenarios. We conclude by discussing the potential benefits, challenges and limitations of our framework.

LLM-driven Imitation of Subrational Behavior : Illusion or Reality?

TL;DR

The paper tackles the challenge of modeling subrational human decision making by leveraging Large Language Models (LLMs) to synthesize demonstrations that feed Imitation Learning (IL). By framing decisions as Markov Decision Processes and generating synthetic state–action data with prompts (often via chain-of-thought or structured summaries), the authors train subrational policies without requiring complex reward engineering or extensive human data. Across four classic tasks—the Ultimatum Game, the Stanford Marshmallow Experiment, a Double or Nothing Gamble, and Academic Procrastination—the IL policies derived from LLM demonstrations replicate established human-like patterns (e.g., rejection of unfair offers, delayed gratification in children, risk biases, and present-bias procrastination). The findings suggest LLM-driven demonstrations can provide scalable, cost-effective probes of subrational behavior, though they acknowledge limitations related to prompt sensitivity, numerical reasoning, and potential biases, and emphasize careful evaluation against real human data. Overall, the work proposes a new paradigm for calibrating subrational models and highlights both the opportunities and challenges of using foundation models for behavioral science and agent-based simulations.

Abstract

Modeling subrational agents, such as humans or economic households, is inherently challenging due to the difficulty in calibrating reinforcement learning models or collecting data that involves human subjects. Existing work highlights the ability of Large Language Models (LLMs) to address complex reasoning tasks and mimic human communication, while simulation using LLMs as agents shows emergent social behaviors, potentially improving our comprehension of human conduct. In this paper, we propose to investigate the use of LLMs to generate synthetic human demonstrations, which are then used to learn subrational agent policies though Imitation Learning. We make an assumption that LLMs can be used as implicit computational models of humans, and propose a framework to use synthetic demonstrations derived from LLMs to model subrational behaviors that are characteristic of humans (e.g., myopic behavior or preference for risk aversion). We experimentally evaluate the ability of our framework to model sub-rationality through four simple scenarios, including the well-researched ultimatum game and marshmallow experiment. To gain confidence in our framework, we are able to replicate well-established findings from prior human studies associated with the above scenarios. We conclude by discussing the potential benefits, challenges and limitations of our framework.
Paper Structure (45 sections, 6 equations, 5 figures, 14 tables, 1 algorithm)

This paper contains 45 sections, 6 equations, 5 figures, 14 tables, 1 algorithm.

Figures (5)

  • Figure 1: Ultimatum Game experiment.
  • Figure 2: Stanford marshmallow experiment
  • Figure 3: "Double or Nothing" gamble.
  • Figure 4: Procrastination experiment.
  • Figure 5: Procrastination experiment: LLM demonstrations when deadline is ten days.