Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation
Jia Gu, Liang Pang, Huawei Shen, Xueqi Cheng
TL;DR
This work treats LLM agents as Markov decision processes and investigates whether they can understand and sample from probability distributions that govern sequential behavior. It distinguishes explicit distributions, which have defined forms like Poisson or Normal, from implicit distributions defined by contextual cues, and assesses understanding via distribution naming and sampling using KS tests, along with a code-assisted sampling approach. The findings show that while LLMs can identify explicit distributions, their sampling fidelity is limited, though chain-of-thought prompting and code tools improve performance for explicit cases; sampling from implicit distributions remains weak. The results highlight important limitations for using LLMs to simulate human-like behavioral sequences and point to the need for enhanced sampling mechanisms or hybrid tool architectures to achieve realistic behavioral simulations in MDP-like settings.
Abstract
With the rapid advancement of large language models (LLMs) for handling complex language tasks, an increasing number of studies are employing LLMs as agents to emulate the sequential decision-making processes of humans often represented as Markov decision-making processes (MDPs). The actions in MDPs adhere to specific probability distributions and require iterative sampling. This arouses curiosity regarding the capacity of LLM agents to comprehend probability distributions, thereby guiding the agent's behavioral decision-making through probabilistic sampling and generating behavioral sequences. To answer the above question, we divide the problem into two main aspects: sequence simulation with known probability distribution and sequence simulation with unknown probability distribution. Our analysis indicates that LLM agents can understand probabilities, but they struggle with probability sampling. Their ability to perform probabilistic sampling can be improved to some extent by integrating coding tools, but this level of sampling precision still makes it difficult to simulate human behavior as agents.
