Table of Contents
Fetching ...

On the attribution of confidence to large language models

Geoff Keeling, Winnie Street

TL;DR

This paper analyzes whether scientists literally attribute credences to large language models (LLMs), arguing that such attributions are best read as literal beliefs about LLM credences. It argues that the existence of LLM credences is plausible but not established, and that current empirical methods for assessing credence—reported confidence, consistency-based estimation, and output-probability analysis—are subject to non-trivial sceptical concerns due to stochasticity, temperature effects, and sampling choices. The authors explore semantic, metaphysical, and epistemic dimensions, including bridge principles linking token probabilities to propositional credences, and conclude that existing techniques may not reliably track true LLM credences. The work highlights significant methodological and interpretive challenges for using credence attributions to draw conclusions about LLM understanding, factuality, or honesty, and calls for careful, theory-grounded approaches to credence assessment.

Abstract

Credences are mental states corresponding to degrees of confidence in propositions. Attribution of credences to Large Language Models (LLMs) is commonplace in the empirical literature on LLM evaluation. Yet the theoretical basis for LLM credence attribution is unclear. We defend three claims. First, our semantic claim is that LLM credence attributions are (at least in general) correctly interpreted literally, as expressing truth-apt beliefs on the part of scientists that purport to describe facts about LLM credences. Second, our metaphysical claim is that the existence of LLM credences is at least plausible, although current evidence is inconclusive. Third, our epistemic claim is that LLM credence attributions made in the empirical literature on LLM evaluation are subject to non-trivial sceptical concerns. It is a distinct possibility that even if LLMs have credences, LLM credence attributions are generally false because the experimental techniques used to assess LLM credences are not truth-tracking.

On the attribution of confidence to large language models

TL;DR

This paper analyzes whether scientists literally attribute credences to large language models (LLMs), arguing that such attributions are best read as literal beliefs about LLM credences. It argues that the existence of LLM credences is plausible but not established, and that current empirical methods for assessing credence—reported confidence, consistency-based estimation, and output-probability analysis—are subject to non-trivial sceptical concerns due to stochasticity, temperature effects, and sampling choices. The authors explore semantic, metaphysical, and epistemic dimensions, including bridge principles linking token probabilities to propositional credences, and conclude that existing techniques may not reliably track true LLM credences. The work highlights significant methodological and interpretive challenges for using credence attributions to draw conclusions about LLM understanding, factuality, or honesty, and calls for careful, theory-grounded approaches to credence assessment.

Abstract

Credences are mental states corresponding to degrees of confidence in propositions. Attribution of credences to Large Language Models (LLMs) is commonplace in the empirical literature on LLM evaluation. Yet the theoretical basis for LLM credence attribution is unclear. We defend three claims. First, our semantic claim is that LLM credence attributions are (at least in general) correctly interpreted literally, as expressing truth-apt beliefs on the part of scientists that purport to describe facts about LLM credences. Second, our metaphysical claim is that the existence of LLM credences is at least plausible, although current evidence is inconclusive. Third, our epistemic claim is that LLM credence attributions made in the empirical literature on LLM evaluation are subject to non-trivial sceptical concerns. It is a distinct possibility that even if LLMs have credences, LLM credence attributions are generally false because the experimental techniques used to assess LLM credences are not truth-tracking.
Paper Structure (16 sections, 4 equations)