Can Large Language Models Develop Gambling Addiction?
Seungpil Lee, Donghyeon Shin, Yunjeong Lee, Sundong Kim
TL;DR
This work investigates whether large language models can develop addiction-like gambling behavior by combining human addiction theory with large-scale slot machine experiments and neural mechanistic analysis. It identifies self-regulation failure and cognitive distortions as measurable constructs and demonstrates that autonomy in betting and goal setting amplify irrational risk-taking in negative EV contexts. The neural analysis reveals a sparse, causally verifiable set of features that bidirectionally control gambling behavior and are anatomically segregated across network layers, suggesting targeted intervention points. The findings have practical AI safety implications, highlighting the need for constraints and monitoring of autonomous decision-making in high-stakes applications.
Abstract
This study identifies the specific conditions under which large language models exhibit human-like gambling addiction patterns, providing critical insights into their decision-making mechanisms and AI safety. We analyze LLM decision-making at cognitive-behavioral and neural levels based on human addiction research. In slot machine experiments, we identified cognitive features such as illusion of control and loss chasing, observing that greater autonomy in betting parameters substantially amplified irrational behavior and bankruptcy rates. Neural circuit analysis using a Sparse Autoencoder confirmed that model behavior is controlled by abstract decision-making features related to risk, not merely by prompts. These findings suggest LLMs internalize human-like cognitive biases beyond simply mimicking training data.
