Exploring the Potential for Large Language Models to Demonstrate Rational Probabilistic Beliefs
Gabriel Freedman, Francesca Toni
TL;DR
This work interrogates whether state-of-the-art LLMs can maintain rational probabilistic beliefs by testing them against core probability axioms. It introduces the Rational Probabilistic Belief (RPB) dataset to probe complementarity and monotonicity via base, specialised, generalised, and negated claim variants, and applies multiple uncertainty quantification methods (Direct Prompting, Chain-of-Thought, ArgLLMs, Top-K Logit Sampling) across diverse models. Findings show that while larger models perform better, they still frequently violate fundamental properties such as $P(A)+P(A^c)=1$ and monotonicity, indicating non-coherent probabilistic reasoning. The results motivate exploring neurosymbolic or symbolic approaches to enforce robust probabilistic inference in LLM-driven systems, rather than relying on scaling alone. Overall, the paper highlights critical limitations in current probabilistic reasoning capabilities of LLMs and outlines directions for future research into reliable, explainable uncertainty representations.
Abstract
Advances in the general capabilities of large language models (LLMs) have led to their use for information retrieval, and as components in automated decision systems. A faithful representation of probabilistic reasoning in these models may be essential to ensure trustworthy, explainable and effective performance in these tasks. Despite previous work suggesting that LLMs can perform complex reasoning and well-calibrated uncertainty quantification, we find that current versions of this class of model lack the ability to provide rational and coherent representations of probabilistic beliefs. To demonstrate this, we introduce a novel dataset of claims with indeterminate truth values and apply a number of well-established techniques for uncertainty quantification to measure the ability of LLM's to adhere to fundamental properties of probabilistic reasoning.
