Table of Contents
Fetching ...

Exploring the Potential for Large Language Models to Demonstrate Rational Probabilistic Beliefs

Gabriel Freedman, Francesca Toni

TL;DR

This work interrogates whether state-of-the-art LLMs can maintain rational probabilistic beliefs by testing them against core probability axioms. It introduces the Rational Probabilistic Belief (RPB) dataset to probe complementarity and monotonicity via base, specialised, generalised, and negated claim variants, and applies multiple uncertainty quantification methods (Direct Prompting, Chain-of-Thought, ArgLLMs, Top-K Logit Sampling) across diverse models. Findings show that while larger models perform better, they still frequently violate fundamental properties such as $P(A)+P(A^c)=1$ and monotonicity, indicating non-coherent probabilistic reasoning. The results motivate exploring neurosymbolic or symbolic approaches to enforce robust probabilistic inference in LLM-driven systems, rather than relying on scaling alone. Overall, the paper highlights critical limitations in current probabilistic reasoning capabilities of LLMs and outlines directions for future research into reliable, explainable uncertainty representations.

Abstract

Advances in the general capabilities of large language models (LLMs) have led to their use for information retrieval, and as components in automated decision systems. A faithful representation of probabilistic reasoning in these models may be essential to ensure trustworthy, explainable and effective performance in these tasks. Despite previous work suggesting that LLMs can perform complex reasoning and well-calibrated uncertainty quantification, we find that current versions of this class of model lack the ability to provide rational and coherent representations of probabilistic beliefs. To demonstrate this, we introduce a novel dataset of claims with indeterminate truth values and apply a number of well-established techniques for uncertainty quantification to measure the ability of LLM's to adhere to fundamental properties of probabilistic reasoning.

Exploring the Potential for Large Language Models to Demonstrate Rational Probabilistic Beliefs

TL;DR

This work interrogates whether state-of-the-art LLMs can maintain rational probabilistic beliefs by testing them against core probability axioms. It introduces the Rational Probabilistic Belief (RPB) dataset to probe complementarity and monotonicity via base, specialised, generalised, and negated claim variants, and applies multiple uncertainty quantification methods (Direct Prompting, Chain-of-Thought, ArgLLMs, Top-K Logit Sampling) across diverse models. Findings show that while larger models perform better, they still frequently violate fundamental properties such as and monotonicity, indicating non-coherent probabilistic reasoning. The results motivate exploring neurosymbolic or symbolic approaches to enforce robust probabilistic inference in LLM-driven systems, rather than relying on scaling alone. Overall, the paper highlights critical limitations in current probabilistic reasoning capabilities of LLMs and outlines directions for future research into reliable, explainable uncertainty representations.

Abstract

Advances in the general capabilities of large language models (LLMs) have led to their use for information retrieval, and as components in automated decision systems. A faithful representation of probabilistic reasoning in these models may be essential to ensure trustworthy, explainable and effective performance in these tasks. Despite previous work suggesting that LLMs can perform complex reasoning and well-calibrated uncertainty quantification, we find that current versions of this class of model lack the ability to provide rational and coherent representations of probabilistic beliefs. To demonstrate this, we introduce a novel dataset of claims with indeterminate truth values and apply a number of well-established techniques for uncertainty quantification to measure the ability of LLM's to adhere to fundamental properties of probabilistic reasoning.

Paper Structure

This paper contains 16 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Examples of LLMs violating three principles of probabilistic reasoning. The image above the line shows the original claim with the corresponding probability assignment by an LLM, and below the line the images demonstrate the violation of the principles of: complementarity (top), specialisation (middle) and generalisation (bottom).
  • Figure 2: Adherence to monotonicity by model and uncertainty quantification methodology. The left panel is the Specialisation task, and the right Generalisation. Both tasks are described in detail in Section \ref{['sec:monot']}. The y-axis represents the magnitude of deviation from correctly monotonic probability estimations, and the x-axis is the model type.
  • Figure 3: Adherence to complementarity by model and uncertainty quantification methodology. Detailed task description is provided in Section \ref{['sec:compl']}.