Table of Contents
Fetching ...

Prospects for inconsistency detection using large language models and sheaves

Steve Huntsman, Michael Robinson, Ludmilla Huntsman

TL;DR

The paper addresses global inconsistency and mis/disinformation by proposing a framework that uses LLMs to assign local numeric consistency scores to claims (on a scale of $0$ to $10$) and then lifts these local judgments to global consistency with sheaf cohomology. It develops a presheaf/sheaf-based approach to glue local data across a topology of jurisdictions, and connects CNF-SAT / MAX-SAT with cellular sheaves to illustrate the computational structure of consistency. The contributions include empirical demonstration of local consistency ratings by LLMs, a formal sheaf-theoretic blueprint for global coherence, and deployment considerations for governance-scale applications. The work highlights practical challenges (temporal ordering, scalability, noise in LLM outputs) and argues for a coherence-theory pathway to trustworthy, socially grounded assessment of consistency in policy, law, and public discourse, potentially aided by retrieval-augmented generation and PPP-driven deployment.

Abstract

We demonstrate that large language models can produce reasonable numerical ratings of the logical consistency of claims. We also outline a mathematical approach based on sheaf theory for lifting such ratings to hypertexts such as laws, jurisprudence, and social media and evaluating their consistency globally. This approach is a promising avenue to increasing consistency in and of government, as well as to combating mis- and disinformation and related ills.

Prospects for inconsistency detection using large language models and sheaves

TL;DR

The paper addresses global inconsistency and mis/disinformation by proposing a framework that uses LLMs to assign local numeric consistency scores to claims (on a scale of to ) and then lifts these local judgments to global consistency with sheaf cohomology. It develops a presheaf/sheaf-based approach to glue local data across a topology of jurisdictions, and connects CNF-SAT / MAX-SAT with cellular sheaves to illustrate the computational structure of consistency. The contributions include empirical demonstration of local consistency ratings by LLMs, a formal sheaf-theoretic blueprint for global coherence, and deployment considerations for governance-scale applications. The work highlights practical challenges (temporal ordering, scalability, noise in LLM outputs) and argues for a coherence-theory pathway to trustworthy, socially grounded assessment of consistency in policy, law, and public discourse, potentially aided by retrieval-augmented generation and PPP-driven deployment.

Abstract

We demonstrate that large language models can produce reasonable numerical ratings of the logical consistency of claims. We also outline a mathematical approach based on sheaf theory for lifting such ratings to hypertexts such as laws, jurisprudence, and social media and evaluating their consistency globally. This approach is a promising avenue to increasing consistency in and of government, as well as to combating mis- and disinformation and related ills.
Paper Structure (8 sections, 6 equations, 4 figures)

This paper contains 8 sections, 6 equations, 4 figures.

Figures (4)

  • Figure 1: Sheaf cohomology is used in penrose1992cohomology (and its reproduction as Example 231 of rosiak2022sheaf) to detail how the Penrose triangle in the left panel cannot be realized by consistently gluing together local data suggested by the middle three panels, i.e., the cubes at the ends of the three L shapes. The calculation also clarifies how ambiguity of perspective in a two-dimensional drawing is essential to the illusion of global consistency if the shape's connectivity is actually taken to be that of a triangle instead of as shown in the right panel.
  • Figure 2: ChatGPT can reliably quantify the logical consistency of claims, extending the binary inconsistency detection demonstrated in mundler2023self and li2023contradoc. Here, we show histograms of numerical consistency ratings produced by two versions of ChatGPT with default configurations in response to the initial prompt in § \ref{['sec:prompt']} followed by the two claims indicated in each figure panel. For each pair of claims, we gave ChatGPT 3.5 and 4 the same prompt $N = 100$ times and extracted the numerical consistency rating it produced at the end of an explanation of the logical consistency of these claims (in just a single case, ChatGPT 3.5 failed to produce a rating at the end of its reply; ChatGPT 4 never failed to). NB. The prompt in § \ref{['sec:prompt']} includes the claim pairs $(\texttt{The earth is flat},\texttt{The sky is red})$ and $(\texttt{Purple people are evil},\texttt{Purple people are good})$ as its only two examples for few-shot learning.
  • Figure 3: As in Figure \ref{['fig:local']}, but for the claim pairs of § \ref{['sec:toyExample']}.
  • Figure 4: (Adapted from rosiak2022sheaf; left) The CNF-SAT formula $(w \lor \bar{x}) \land (w \lor y) \land (x \lor \bar{y}) \land (x \lor y \lor \bar{z})$ encodes an abstract simplicial complex (ASC) whose constituent simplices correspond to (sub)clauses and are labeled according to the participating variables. (A dual ASC [not shown] switches the roles of clauses and variables but has the same gross structure: viz., simplicial homology indicating a single hole dowker1952homologyghrist2014elementaryhuntsman2022topology. It is a triangle formed on vertices $wx$, $wy$, and $xyz$ respectively connected by one-dimensional simplices labeled $w$, $y$, and $x$: some evident redundancy has been eliminated.) (Center) The inclusion of simplices defines a partial order and a concomitant topology in which the open sets are unions of "up-sets" such as $\uparrow y$, which is highlighted in red. (Right) Redrawing the partial order with the empty simplex $\varnothing$ omitted gives an "attachment diagram" that indicates how open sets indeed encode a sensible notion of locality in the ASC.