Quantum-Like Contextuality in Large Language Models
Kin Ian Lo, Mehrnoosh Sadrzadeh, Shane Mansfield
TL;DR
The paper investigates whether quantum-like contextuality can emerge in natural language by constructing a linguistic schema modeled on contextual quantum scenarios and instantiating it with a masked-language modeling setup using BERT. They apply two contextuality frameworks—the signalling-corrected sheaf-theoretic model and Contextuality-by-Default (CbD)—and report massive numbers of contextual instances, including 77,118 sheaf-contextual and 36,938,948 CbD-contextual cases. A key finding is that the difference in BERT logit scores, ${\Delta l}$, maps to the embedding-distance between noun vectors, yielding ${\epsilon} = \tanh({\Delta l}/2)$ and enabling a strong link between contextuality and Euclidean distance in embedding space; cubic regression further strengthens these associations. The results suggest that quantum-inspired contextuality concepts may be relevant for language tasks and encourage future exploration of contextuality-driven advantages, human judgments, and extensions to broader coreference challenges like donkey anaphora and the Winograd Schema Challenge.
Abstract
Contextuality is a distinguishing feature of quantum mechanics and there is growing evidence that it is a necessary condition for quantum advantage. In order to make use of it, researchers have been asking whether similar phenomena arise in other domains. The answer has been yes, e.g. in behavioural sciences. However, one has to move to frameworks that take some degree of signalling into account. Two such frameworks exist: (1) a signalling-corrected sheaf theoretic model, and (2) the Contextuality-by-Default (CbD) framework. This paper provides the first large scale experimental evidence for a yes answer in natural language. We construct a linguistic schema modelled over a contextual quantum scenario, instantiate it in the Simple English Wikipedia and extract probability distributions for the instances using the large language model BERT. This led to the discovery of 77,118 sheaf-contextual and 36,938,948 CbD contextual instances. We proved that the contextual instances came from semantically similar words, by deriving an equation between degrees of contextuality and Euclidean distances of BERT's embedding vectors. A regression model further reveals that Euclidean distance is indeed the best statistical predictor of contextuality. Our linguistic schema is a variant of the co-reference resolution challenge. These results are an indication that quantum methods may be advantageous in language tasks.
