Table of Contents
Fetching ...

Quantum-Like Contextuality in Large Language Models

Kin Ian Lo, Mehrnoosh Sadrzadeh, Shane Mansfield

TL;DR

The paper investigates whether quantum-like contextuality can emerge in natural language by constructing a linguistic schema modeled on contextual quantum scenarios and instantiating it with a masked-language modeling setup using BERT. They apply two contextuality frameworks—the signalling-corrected sheaf-theoretic model and Contextuality-by-Default (CbD)—and report massive numbers of contextual instances, including 77,118 sheaf-contextual and 36,938,948 CbD-contextual cases. A key finding is that the difference in BERT logit scores, ${\Delta l}$, maps to the embedding-distance between noun vectors, yielding ${\epsilon} = \tanh({\Delta l}/2)$ and enabling a strong link between contextuality and Euclidean distance in embedding space; cubic regression further strengthens these associations. The results suggest that quantum-inspired contextuality concepts may be relevant for language tasks and encourage future exploration of contextuality-driven advantages, human judgments, and extensions to broader coreference challenges like donkey anaphora and the Winograd Schema Challenge.

Abstract

Contextuality is a distinguishing feature of quantum mechanics and there is growing evidence that it is a necessary condition for quantum advantage. In order to make use of it, researchers have been asking whether similar phenomena arise in other domains. The answer has been yes, e.g. in behavioural sciences. However, one has to move to frameworks that take some degree of signalling into account. Two such frameworks exist: (1) a signalling-corrected sheaf theoretic model, and (2) the Contextuality-by-Default (CbD) framework. This paper provides the first large scale experimental evidence for a yes answer in natural language. We construct a linguistic schema modelled over a contextual quantum scenario, instantiate it in the Simple English Wikipedia and extract probability distributions for the instances using the large language model BERT. This led to the discovery of 77,118 sheaf-contextual and 36,938,948 CbD contextual instances. We proved that the contextual instances came from semantically similar words, by deriving an equation between degrees of contextuality and Euclidean distances of BERT's embedding vectors. A regression model further reveals that Euclidean distance is indeed the best statistical predictor of contextuality. Our linguistic schema is a variant of the co-reference resolution challenge. These results are an indication that quantum methods may be advantageous in language tasks.

Quantum-Like Contextuality in Large Language Models

TL;DR

The paper investigates whether quantum-like contextuality can emerge in natural language by constructing a linguistic schema modeled on contextual quantum scenarios and instantiating it with a masked-language modeling setup using BERT. They apply two contextuality frameworks—the signalling-corrected sheaf-theoretic model and Contextuality-by-Default (CbD)—and report massive numbers of contextual instances, including 77,118 sheaf-contextual and 36,938,948 CbD-contextual cases. A key finding is that the difference in BERT logit scores, , maps to the embedding-distance between noun vectors, yielding and enabling a strong link between contextuality and Euclidean distance in embedding space; cubic regression further strengthens these associations. The results suggest that quantum-inspired contextuality concepts may be relevant for language tasks and encourage future exploration of contextuality-driven advantages, human judgments, and extensions to broader coreference challenges like donkey anaphora and the Winograd Schema Challenge.

Abstract

Contextuality is a distinguishing feature of quantum mechanics and there is growing evidence that it is a necessary condition for quantum advantage. In order to make use of it, researchers have been asking whether similar phenomena arise in other domains. The answer has been yes, e.g. in behavioural sciences. However, one has to move to frameworks that take some degree of signalling into account. Two such frameworks exist: (1) a signalling-corrected sheaf theoretic model, and (2) the Contextuality-by-Default (CbD) framework. This paper provides the first large scale experimental evidence for a yes answer in natural language. We construct a linguistic schema modelled over a contextual quantum scenario, instantiate it in the Simple English Wikipedia and extract probability distributions for the instances using the large language model BERT. This led to the discovery of 77,118 sheaf-contextual and 36,938,948 CbD contextual instances. We proved that the contextual instances came from semantically similar words, by deriving an equation between degrees of contextuality and Euclidean distances of BERT's embedding vectors. A regression model further reveals that Euclidean distance is indeed the best statistical predictor of contextuality. Our linguistic schema is a variant of the co-reference resolution challenge. These results are an indication that quantum methods may be advantageous in language tasks.

Paper Structure

This paper contains 6 sections, 7 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 6: The distribution of the instances in the space of direct influence and signalling fraction, which is equally divided into 200 times 200 bins. The colour of each bin represents the log of the number of instances that fall into that bin. As determined by Equation \ref{['eq:inequality']}, certain regions of the space are not accessible to the instances, which is shown as forbidden in the figure. The regions where the instances are either CbD contextual or sheaf contextual are outlined in the figure.
  • Figure 7: Histograms of (a) the signalling fraction and (b) the direct influence of the 519,660 models constructed for the similar nouns subset of the dataset.
  • Figure 8: A flow chart illustrating how the embedding vectors are transformed into the output vectors in a BERT model. Extra tokens [CLS] and [SEP] are added to the input sequence to indicate the start and end of the sequence, while the [MASK] token is used to indicate the mask.
  • Figure 9: A 2-dimensional sketch of a geometric interpretation of the mask predictions from BERT for the PR-anaphora schema. The vectors $\mathbf{p}_i$ are the output vectors of the masked token for the $i$-th context in the schema. The distance from a predictor vector $\mathbf{p}_i$ to the hyperplane defined by the equation $\mathbf{p} \cdot \Delta \mathbf{x} + \Delta b = 0$ coincides with $\Delta l_i / \|\Delta \mathbf{x}\|$. As $\epsilon_i$ relates to $\Delta l_i$ monotonically, specifically $\epsilon_i = \tanh(\Delta l_i / 2)$, the signalling fraction $\textsf{SF} = \max |\epsilon_i|$ depends only on the prediction vectors furthest away from the hyperplane. In the figure, the prediction vector $\mathbf{p}_2$ (coloured red) is the furthest away from the hyperplane.
  • Figure 10: The $R^2$ scores of the polynomial regression models at different polynomail degrees predicting (left) the signalling fraction and (right) the direct influence.
  • ...and 1 more figures

Theorems & Definitions (1)

  • proof