A Geometric Taxonomy of Hallucinations in LLMs

Javier Marín

A Geometric Taxonomy of Hallucinations in LLMs

Javier Marín

TL;DR

The contribution is a geometric taxonomy clarifying the scope of embedding-based detection: Types I and II are detectable; Type III requires external verification mechanisms, and Type III requires external verification mechanisms.

Abstract

The term "hallucination" in large language models conflates distinct phenomena with different geometric signatures in embedding space. We propose a taxonomy identifying three types: unfaithfulness (failure to engage with provided context), confabulation (invention of semantically foreign content), and factual error (incorrect claims within correct conceptual frames). We observe a striking asymmetry. On standard benchmarks where hallucinations are LLM-generated, detection is domain-local: AUROC 0.76-0.99 within domains, but 0.50 (chance level) across domains. Discriminative directions are approximately orthogonal between domains (mean cosine similarity -0.07). On human-crafted confabulations - invented institutions, redefined terminology, fabricated mechanisms - a single global direction achieves 0.96 AUROC with 3.8% cross-domain degradation. We interpret this divergence as follows: benchmarks capture generation artifacts (stylistic signatures of prompted fabrication), while human-crafted confabulations capture genuine topical drift. The geometric structure differs because the underlying phenomena differ. Type III errors show 0.478 AUROC - indistinguishable from chance. This reflects a theoretical constraint: embeddings encode distributional co-occurrence, not correspondence to external reality. Statements with identical contextual patterns occupy similar embedding regions regardless of truth value. The contribution is a geometric taxonomy clarifying the scope of embedding-based detection: Types I and II are detectable; Type III requires external verification mechanisms.

A Geometric Taxonomy of Hallucinations in LLMs

TL;DR

Abstract

Paper Structure (32 sections, 13 equations, 1 figure, 7 tables)

This paper contains 32 sections, 13 equations, 1 figure, 7 tables.

Introduction
Related Work
The Contested Meaning of Hallucination
Detection Methods
Geometric Approaches
Benchmarks and Their Construction
Theoretical Framework
The Embedding Hypersphere
The Semantic Grounding Index
The Directional Grounding Index
Computational Complexity
A Geometric Taxonomy of Hallucination
Experimental Methodology
Research Questions
Datasets
...and 17 more sections

Figures (1)

Figure 1: Geometric taxonomy of hallucination types on the embedding hypersphere $\mathbf{S}^{d-1}$. Query $\mathbf{q}$ and context $\mathbf{c}$ define anchor points. The plausibility manifold $\mathcal{P}_q$ (green, dashed) contains semantically appropriate responses. A grounded response (blue) departs from $\mathbf{q}$ toward $\mathbf{c}$ and lands within $\mathcal{P}_q$. Type I unfaithfulness (purple) shows semantic laziness, remaining near $\mathbf{q}$. Type II confabulation (red) departs in an unrelated direction, landing outside $\mathcal{P}_q$. Type III factual error (orange) reaches $\mathcal{P}_q$ but occupies a factually incorrect position within the plausible region.

Theorems & Definitions (4)

Definition 1: Semantic Grounding Index
Definition 2: Directional Grounding Index - $\Gamma$
Definition 3: Local Directional Grounding Index
Definition 4: Plausibility Manifold

A Geometric Taxonomy of Hallucinations in LLMs

TL;DR

Abstract

A Geometric Taxonomy of Hallucinations in LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (1)

Theorems & Definitions (4)