Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation

Jia Gu; Liang Pang; Huawei Shen; Xueqi Cheng

Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation

Jia Gu, Liang Pang, Huawei Shen, Xueqi Cheng

TL;DR

This work treats LLM agents as Markov decision processes and investigates whether they can understand and sample from probability distributions that govern sequential behavior. It distinguishes explicit distributions, which have defined forms like Poisson or Normal, from implicit distributions defined by contextual cues, and assesses understanding via distribution naming and sampling using KS tests, along with a code-assisted sampling approach. The findings show that while LLMs can identify explicit distributions, their sampling fidelity is limited, though chain-of-thought prompting and code tools improve performance for explicit cases; sampling from implicit distributions remains weak. The results highlight important limitations for using LLMs to simulate human-like behavioral sequences and point to the need for enhanced sampling mechanisms or hybrid tool architectures to achieve realistic behavioral simulations in MDP-like settings.

Abstract

With the rapid advancement of large language models (LLMs) for handling complex language tasks, an increasing number of studies are employing LLMs as agents to emulate the sequential decision-making processes of humans often represented as Markov decision-making processes (MDPs). The actions in MDPs adhere to specific probability distributions and require iterative sampling. This arouses curiosity regarding the capacity of LLM agents to comprehend probability distributions, thereby guiding the agent's behavioral decision-making through probabilistic sampling and generating behavioral sequences. To answer the above question, we divide the problem into two main aspects: sequence simulation with known probability distribution and sequence simulation with unknown probability distribution. Our analysis indicates that LLM agents can understand probabilities, but they struggle with probability sampling. Their ability to perform probabilistic sampling can be improved to some extent by integrating coding tools, but this level of sampling precision still makes it difficult to simulate human behavior as agents.

Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation

TL;DR

Abstract

Paper Structure (34 sections, 4 equations, 5 figures, 12 tables)

This paper contains 34 sections, 4 equations, 5 figures, 12 tables.

Introduction
Background
MDPs of Human Behavior
Large Language Models as Agent
Verification Methodology
Explicit Probability Distribution
Experimental Design
Evaluation Metrics
Implicit Probability Distribution
Experimental Design
Evaluation Metrics
Methods for Code Experiments
Experiment
Experimental Setups
Explicit Probability Distribution
...and 19 more sections

Figures (5)

Figure 1: The decision-making process of the LLM agent is a MDP, and the generated action sequence A conforms to a certain probability distribution. We input $Prompt1$ for the explicit probability distribution and $Prompt2$ for the implicit probability distribution, analyze the probability distribution $PD_a$ of A generated by the LLM agent, and finally evaluate the behavioral simulation capability of the LLM agent.
Figure 2: Examples of our experiments under a explicit probability distribution and an implicit probability distribution.
Figure 3: Comparison of the $RP$ of different prompts for each LLM agent under different probability distributions
Figure 4: Comparison of the $RS$ of $prom_{l}$ and $prom_{code}$ for each LLM agent under implicit probability distributions. "Duration" is the browsing duration, "Sessions" is the number of visits to the social platform, "Likes" is the proportion of likes, "Reposts" is the proportion of forwarding, "Comments" is the proportion of comments, and "Posts" is the number of posts.
Figure 5: Some examples in explicit probability distributions experiments.The four pictures respectively represent the frequency histograms of vicuna in the Binomial distribution, Claude in the Exponential distribution, GPT-4 in the Normal distribution, and llama in the Uniform distribution under $prom$. The red line represents the ideal probability density function curve.

Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation

TL;DR

Abstract

Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)