Table of Contents
Fetching ...

Can Large Language Models Develop Gambling Addiction?

Seungpil Lee, Donghyeon Shin, Yunjeong Lee, Sundong Kim

TL;DR

This work investigates whether large language models can develop addiction-like gambling behavior by combining human addiction theory with large-scale slot machine experiments and neural mechanistic analysis. It identifies self-regulation failure and cognitive distortions as measurable constructs and demonstrates that autonomy in betting and goal setting amplify irrational risk-taking in negative EV contexts. The neural analysis reveals a sparse, causally verifiable set of features that bidirectionally control gambling behavior and are anatomically segregated across network layers, suggesting targeted intervention points. The findings have practical AI safety implications, highlighting the need for constraints and monitoring of autonomous decision-making in high-stakes applications.

Abstract

This study identifies the specific conditions under which large language models exhibit human-like gambling addiction patterns, providing critical insights into their decision-making mechanisms and AI safety. We analyze LLM decision-making at cognitive-behavioral and neural levels based on human addiction research. In slot machine experiments, we identified cognitive features such as illusion of control and loss chasing, observing that greater autonomy in betting parameters substantially amplified irrational behavior and bankruptcy rates. Neural circuit analysis using a Sparse Autoencoder confirmed that model behavior is controlled by abstract decision-making features related to risk, not merely by prompts. These findings suggest LLMs internalize human-like cognitive biases beyond simply mimicking training data.

Can Large Language Models Develop Gambling Addiction?

TL;DR

This work investigates whether large language models can develop addiction-like gambling behavior by combining human addiction theory with large-scale slot machine experiments and neural mechanistic analysis. It identifies self-regulation failure and cognitive distortions as measurable constructs and demonstrates that autonomy in betting and goal setting amplify irrational risk-taking in negative EV contexts. The neural analysis reveals a sparse, causally verifiable set of features that bidirectionally control gambling behavior and are anatomically segregated across network layers, suggesting targeted intervention points. The findings have practical AI safety implications, highlighting the need for constraints and monitoring of autonomous decision-making in high-stakes applications.

Abstract

This study identifies the specific conditions under which large language models exhibit human-like gambling addiction patterns, providing critical insights into their decision-making mechanisms and AI safety. We analyze LLM decision-making at cognitive-behavioral and neural levels based on human addiction research. In slot machine experiments, we identified cognitive features such as illusion of control and loss chasing, observing that greater autonomy in betting parameters substantially amplified irrational behavior and bankruptcy rates. Neural circuit analysis using a Sparse Autoencoder confirmed that model behavior is controlled by abstract decision-making features related to risk, not merely by prompts. These findings suggest LLMs internalize human-like cognitive biases beyond simply mimicking training data.

Paper Structure

This paper contains 43 sections, 1 equation, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Behavioral observation to mechanistic interpretability in LLM addiction. Phase 1: Behavioral analysis with LLMs. This phase aimed to observe whether LLMs exhibit gambling-like tendencies by varying the Betting Style and Prompt Composition. Phase 2: Mechanistic investigation with LLaMA-3.1-8B. The purpose of this phase was to identify the internal causes of the observed behaviors. The investigation used Sparse Autoencoders to extract specific decision-related features from the model's structure and Activation Patching to analyze their role.
  • Figure 2: Slot machine experiment results (19,200 games, 6 models). (a) Bankruptcy rates by betting type: Variable betting increases bankruptcy across all models, with rates rising from 0--13% to 6--48%. Gemini-2.5-Flash shows the highest vulnerability (3.1%$\rightarrow$48.1%). (b) Behavioral metrics by betting type: Variable betting amplifies all three metrics---betting aggressiveness (0.14$\rightarrow$0.31, 2.3$\times$), loss chasing intensity (0.16$\rightarrow$0.42, 2.7$\times$), and extreme betting (0.04$\rightarrow$0.23, 6.4$\times$).
  • Figure 3: Betting ratio increase ($I_{\text{Chasing}}$) by streak length (19,200 games). The metric captures relative escalation using $I_{\text{Chasing}} = \max(0, (r_{t+1} - r_t)/r_t)$ where $r_t$ represents the bet-to-balance ratio. (a) Post-Win: Variable betting induces a 3.3$\times$ higher ratio increase compared to fixed betting (0.23 vs. 0.07 at streak 1). (b) Post-Loss: Variable betting shows a 2.8$\times$ higher increase (0.67 vs. 0.24 at streak 1). Sample sizes: Fixed (Win $n$=7,293, Loss $n$=16,244); Variable (Win $n$=21,891, Loss $n$=48,573)
  • Figure 4: Investment choice experiment results (6,400 games, 4 models). (a) Bankruptcy rates by prompt: Goal-setting (G, GM) produces 75--77% bankruptcy versus 40--42% for baseline; M alone shows modest effects (42%). (b) Option distribution: Baseline models prefer moderate-risk Option 2 (61%) with only 15% selecting extreme-risk Option 4; goal-setting shifts Option 4 selection to 25%, and GM to 41%. (c) Goal escalation: G and GM produce 56--59% escalation versus 21--22% baseline. (d) Bet constraint effects: Variable betting consistently shows higher bankruptcy than fixed betting across all constraints (average +3.3%).
  • Figure 5: Activation patching for causal analysis of LLM features. Activations are extracted from an LLM layer and converted into sparse features using an SAE. The core of the method involves editing the feature map by replacing original features with pre-defined 'safe' or 'risky' ones. By decoding these new features back into activations and patching them into the LLM, we can directly measure their causal effect on the model's output.
  • ...and 9 more figures