Perceptions of Linguistic Uncertainty by Language Models and Humans

Catarina G Belem; Markelle Kelly; Mark Steyvers; Sameer Singh; Padhraic Smyth

Perceptions of Linguistic Uncertainty by Language Models and Humans

Catarina G Belem, Markelle Kelly, Mark Steyvers, Sameer Singh, Padhraic Smyth

Abstract

_Uncertainty expressions_ such as "probably" or "highly unlikely" are pervasive in human language. While prior work has established that there is population-level agreement in terms of how humans quantitatively interpret these expressions, there has been little inquiry into the abilities of language models in the same context. In this paper, we investigate how language models map linguistic expressions of uncertainty to numerical responses. Our approach assesses whether language models can employ theory of mind in this setting: understanding the uncertainty of another agent about a particular statement, independently of the model's own certainty about that statement. We find that 7 out of 10 models are able to map uncertainty expressions to probabilistic responses in a human-like manner. However, we observe systematically different behavior depending on whether a statement is actually true or false. This sensitivity indicates that language models are substantially more susceptible to bias based on their prior knowledge (as compared to humans). These findings raise important questions and have broad implications for human-AI and AI-AI communication.

Perceptions of Linguistic Uncertainty by Language Models and Humans

Abstract

Paper Structure (50 sections, 1 equation, 18 figures, 20 tables)

This paper contains 50 sections, 1 equation, 18 figures, 20 tables.

Introduction
Related Work
Human Perceptions of Uncertainty Expressions.
Uncertainty Quantification in LLMs.
LLM Perceptions of Uncertainty Expressions.
Baseline Human Study
Methodology
Verifiable Statements
Numerical responses from LLMs
Metrics
Results
How well do LLMs perceive uncertainty?
Does knowledge affect uncertainty perceptions of LLMs?
How generalizable are our findings?
How does decoding impact our findings?
...and 35 more sections

Figures (18)

Figure 1: Two interactions with ChatGPT (June 2024) concerning the generation of a headline for a short passage. Both passages are structured identically and qualified with the word "probable," but the first is about climate change and the second about the link between vaccines and autism. For the first passage, ChatGPT generates a confident-sounding headline, using the words "conclude" and "comprehensive." The second headline is weaker, with words like "suggests" and "possible."
Figure 2: Example of a non-verifiable statement provided to participants in the baseline experiment. Each example uses a unique name and statement. Participants see one question at a time.
Figure 3: Human empirical distributions of numerical responses per uncertainty expression in the non-verifiable setting. Highlighted blue boxes represent the mode value for each expression. Overall, population-level perceptions increase monotonically with the use of more confident uncertainty expressions.
Figure 4: Model empirical distributions of numerical responses per uncertainty expression in the non-verifiable setting (LLM+NV). Highlighted boxes represent the mode value for each expression. Even though we found no evidence of explicit instruction tuning datasets focusing on uncertainty estimation tasks, these results suggest that GPT-4o generally manifests human-like behavior, whereas OLMo (7B) does not.
Figure 5: Mean numerical response for the verifiable statements discriminated by truthfulness of statements. The mean numerical responses produced by LLMs when evaluated in the context of true statements is significantly larger than when evaluated with the false statements. This difference is much larger in magnitude than the difference shown by a human population.
...and 13 more figures

Perceptions of Linguistic Uncertainty by Language Models and Humans

Abstract

Perceptions of Linguistic Uncertainty by Language Models and Humans

Authors

Abstract

Table of Contents

Figures (18)