Table of Contents
Fetching ...

Mind the (DH) Gap! A Contrast in Risky Choices Between Reasoning and Conversational LLMs

Luise Ge, Yongyan Zhang, Yevgeniy Vorobeychik

TL;DR

This work investigates how large language models decide under uncertainty by contrasting explicit versus experience-based prospect representations and by evaluating the effect of eliciting explanations. Using a controlled risky-choice benchmark across 20 frontier/open LLMs, humans, and an economic rational agent, the authors reveal a consistent two-cluster division: reasoning models that closely approximate rational payoff maximization and are robust to context, and conversational models that are more sensitive to representation, framing, and explanations. A four-parameter prospect-theory framework is fitted to both frontier and open models, showing RM alignment with $\sigma\approx1$, $\gamma\approx1$, and high $\beta$, while CMs exhibit more variability and lower determinism; training for mathematical reasoning enhances RM-like rationality in open models. The findings highlight how the decision interface and training regimes shape LLM risk behavior, with implications for deploying LLMs in decision-support and agentic contexts where explainability and reliability under uncertainty matter.

Abstract

The use of large language models either as decision support systems, or in agentic workflows, is rapidly transforming the digital ecosystem. However, the understanding of LLM decision-making under uncertainty remains limited. We initiate a comparative study of LLM risky choices along two dimensions: (1) prospect representation (explicit vs. experience based) and (2) decision rationale (explanation). Our study, which involves 20 frontier and open LLMs, is complemented by a matched human subjects experiment, which provides one reference point, while an expected payoff maximizing rational agent model provides another. We find that LLMs cluster into two categories: reasoning models (RMs) and conversational models (CMs). RMs tend towards rational behavior, are insensitive to the order of prospects, gain/loss framing, and explanations, and behave similarly whether prospects are explicit or presented via experience history. CMs are significantly less rational, slightly more human-like, sensitive to prospect ordering, framing, and explanation, and exhibit a large description-history gap. Paired comparisons of open LLMs suggest that a key factor differentiating RMs and CMs is training for mathematical reasoning.

Mind the (DH) Gap! A Contrast in Risky Choices Between Reasoning and Conversational LLMs

TL;DR

This work investigates how large language models decide under uncertainty by contrasting explicit versus experience-based prospect representations and by evaluating the effect of eliciting explanations. Using a controlled risky-choice benchmark across 20 frontier/open LLMs, humans, and an economic rational agent, the authors reveal a consistent two-cluster division: reasoning models that closely approximate rational payoff maximization and are robust to context, and conversational models that are more sensitive to representation, framing, and explanations. A four-parameter prospect-theory framework is fitted to both frontier and open models, showing RM alignment with , , and high , while CMs exhibit more variability and lower determinism; training for mathematical reasoning enhances RM-like rationality in open models. The findings highlight how the decision interface and training regimes shape LLM risk behavior, with implications for deploying LLMs in decision-support and agentic contexts where explainability and reliability under uncertainty matter.

Abstract

The use of large language models either as decision support systems, or in agentic workflows, is rapidly transforming the digital ecosystem. However, the understanding of LLM decision-making under uncertainty remains limited. We initiate a comparative study of LLM risky choices along two dimensions: (1) prospect representation (explicit vs. experience based) and (2) decision rationale (explanation). Our study, which involves 20 frontier and open LLMs, is complemented by a matched human subjects experiment, which provides one reference point, while an expected payoff maximizing rational agent model provides another. We find that LLMs cluster into two categories: reasoning models (RMs) and conversational models (CMs). RMs tend towards rational behavior, are insensitive to the order of prospects, gain/loss framing, and explanations, and behave similarly whether prospects are explicit or presented via experience history. CMs are significantly less rational, slightly more human-like, sensitive to prospect ordering, framing, and explanation, and exhibit a large description-history gap. Paired comparisons of open LLMs suggest that a key factor differentiating RMs and CMs is training for mathematical reasoning.
Paper Structure (36 sections, 9 equations, 13 figures, 14 tables)

This paper contains 36 sections, 9 equations, 13 figures, 14 tables.

Figures (13)

  • Figure 1: Correlation matrix involving (1) frontier LLMs (GPT-4.1 and 5.1, Gemini-2.5-Flash and Pro, DeepSeek-R1 and Chat, and Claude-Haiku-4.5), (2) economicus, and (3) human responses.
  • Figure 2: Pairwise correlation between each frontier model and both the human and economicus references, shown as a 2D plot. Circles are CMs while x's are RMs.
  • Figure 3: Consistency and decisiveness heatmap for frontier models and the human subjects.
  • Figure 4: HE representation of frontier LLMs (with human subjects as the reference). Change from explicit (blue) to implicit (red) prospects for each model is indicated by the arrows. RMs are x's and CMs are circles.
  • Figure 5: HE representation of open LLMs (with human subjects as the reference). Change from explicit (blue) to implicit (red) prospects for each model is indicated by the arrows. RMs are x's and CMs are circles.
  • ...and 8 more figures