Experimental Pragmatics with Machines: Testing LLM Predictions for the Inferences of Plain and Embedded Disjunctions
Polina Tsvilodub, Paul Marty, Sonia Ramotowska, Jacopo Romoli, Michael Franke
TL;DR
The paper investigates whether state-of-the-art large language systems (LLSs) predict the fine-grained inferences associated with plain and embedded disjunctions—free choice (FC), ignorance (II), and distributive (DI)—in parity with human data and in line with competing theories (TIA, RIA, NIA). By recreating human mystery-box experiments in prompt-driven evaluations across diverse transformer-based systems and measuring predictive log-probabilities, the authors quantify alignment to human acceptance rates using $R^2$ and bootstrap confidence intervals. Findings show that top LLSs can reproduce human-like distinctions between disjunction inferences and scalar implicatures in several conditions, with model size often correlating with better fit, yet there is substantial variability across inferences and contexts, especially for negation and modal/nominal scopes. The study highlights the potential and limits of using LLSs to test linguistic theories, and proposes criteria for robust interpretation, such as cross-task consistency, full-distribution analysis, and capacity-aware benchmarking.
Abstract
Human communication is based on a variety of inferences that we draw from sentences, often going beyond what is literally said. While there is wide agreement on the basic distinction between entailment, implicature, and presupposition, the status of many inferences remains controversial. In this paper, we focus on three inferences of plain and embedded disjunctions, and compare them with regular scalar implicatures. We investigate this comparison from the novel perspective of the predictions of state-of-the-art large language models, using the same experimental paradigms as recent studies investigating the same inferences with humans. The results of our best performing models mostly align with those of humans, both in the large differences we find between those inferences and implicatures, as well as in fine-grained distinctions among different aspects of those inferences.
