Table of Contents
Fetching ...

A Geometric Analysis of Small-sized Language Model Hallucinations

Emanuele Ricco, Elia Onofri, Lorenzo Cima, Stefano Cresci, Roberto Di Pietro

TL;DR

Small-sized LLMs can hallucinate even when knowledge is present. The authors propose a geometric view by embedding repeated responses per prompt into a fixed $384$-dimensional space and analyzing distributional structure with Wasserstein distances and Fisher Discriminant Analysis, followed by a geometry-aware label propagation that scales limited annotations to large sets. They demonstrate robust, cross-model separability between genuine and hallucinated responses and show that as few as $30$–$60$ labeled examples suffice to reach high accuracy ($F1>90\%$) across ten diverse models, using a single discriminative direction. A fully labeled dataset (150 generations for 200 prompts across 10 small LLMs) is released to support this line of analysis, highlighting the practical utility of embedding-space geometry for scalable hallucination detection and retrieval-aware evaluation. Overall, the work reframes hallucinations as a geometric phenomenon in response distributions, complementary to knowledge-centric and single-response evaluation paradigms.

Abstract

Hallucinations -- fluent but factually incorrect responses -- pose a major challenge to the reliability of language models, especially in multi-step or agentic settings. This work investigates hallucinations in small-sized LLMs through a geometric perspective, starting from the hypothesis that when models generate multiple responses to the same prompt, genuine ones exhibit tighter clustering in the embedding space, we prove this hypothesis and, leveraging this geometrical insight, we also show that it is possible to achieve a consistent level of separability. This latter result is used to introduce a label-efficient propagation method that classifies large collections of responses from just 30-50 annotations, achieving F1 scores above 90%. Our findings, framing hallucinations from a geometric perspective in the embedding space, complement traditional knowledge-centric and single-response evaluation paradigms, paving the way for further research.

A Geometric Analysis of Small-sized Language Model Hallucinations

TL;DR

Small-sized LLMs can hallucinate even when knowledge is present. The authors propose a geometric view by embedding repeated responses per prompt into a fixed -dimensional space and analyzing distributional structure with Wasserstein distances and Fisher Discriminant Analysis, followed by a geometry-aware label propagation that scales limited annotations to large sets. They demonstrate robust, cross-model separability between genuine and hallucinated responses and show that as few as labeled examples suffice to reach high accuracy () across ten diverse models, using a single discriminative direction. A fully labeled dataset (150 generations for 200 prompts across 10 small LLMs) is released to support this line of analysis, highlighting the practical utility of embedding-space geometry for scalable hallucination detection and retrieval-aware evaluation. Overall, the work reframes hallucinations as a geometric phenomenon in response distributions, complementary to knowledge-centric and single-response evaluation paradigms.

Abstract

Hallucinations -- fluent but factually incorrect responses -- pose a major challenge to the reliability of language models, especially in multi-step or agentic settings. This work investigates hallucinations in small-sized LLMs through a geometric perspective, starting from the hypothesis that when models generate multiple responses to the same prompt, genuine ones exhibit tighter clustering in the embedding space, we prove this hypothesis and, leveraging this geometrical insight, we also show that it is possible to achieve a consistent level of separability. This latter result is used to introduce a label-efficient propagation method that classifies large collections of responses from just 30-50 annotations, achieving F1 scores above 90%. Our findings, framing hallucinations from a geometric perspective in the embedding space, complement traditional knowledge-centric and single-response evaluation paradigms, paving the way for further research.
Paper Structure (30 sections, 4 equations, 9 figures, 7 tables)

This paper contains 30 sections, 4 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Pipeline of the proposed method. (Left) 200 different prompts are fed into 10 LLMs, generating 150 responses each. (Center) Responses are tagged as Genuine (G, green) or Hallucinated (H, red) under Claude Sonnet 4.5, and embedded through SBERT (blue circle) in $\mathbb{R}^{384}$ (here reduced with T-SNE for plots). Two analyses follow. (Structural, top right) Mutual distances intra-G, intra-H, and inter-(GH) (blue) are evaluated in the embedding space, showing significant differences in the distributions (cf. Figures \ref{['fig:S_promptViolins']} and \ref{['fig:S_WassVsNull_Qwen32B']}), yet being non directly separable; data points are embedded into the linear space maximising Fisher criterion, achieving optimal separability in terms of spatial and distances distribution (cf. Table \ref{['tab:S_FisherSeparability']}). (Label Propagation, bottom right) The structural analysis is exploited to classify previously unseen responses generated under the same setting: here, separability in the embedding space is potentially scarce, requiring the usage of Fisher space for reliable performances (cf. Table \ref{['tab:label_propagation_performance']}): points are classified by the distributions of the point-to-clusters distances.
  • Figure 2: Distributions of mutual intra-class distances for Genuine ($\mathcal{D}_{\textsc{gg}}$, green) and Hallucinated ($\mathcal{D}_{\textsc{hh}}$, red) responses. Inter-class distance distributions $\mathcal{D}_{\textsc{gh}}$ are overlaid as blue boxplots, with medians highlighted. Statistical significance, assessed using Wilcoxon test, refers to differences between $\mathcal{D}_{\textsc{gg}}$ and $\mathcal{D}_{\textsc{hh}}$: *** $p < .001$, ** $p < .01$, * $p < .05$, ns otherwise.
  • Figure 3: Statistical relevance of the distributional separation between $\mathcal{D}_{\textsc{gg}}$ and $\mathcal{D}_{\textsc{hh}}$ over the various prompts for qwen2.5-32B. (Top) The observed Wasserstein distance $W(\mathcal{D}_{\textsc{gg}}, \mathcal{D}_{\textsc{hh}})$, represented by ordered dots, is compared with the same distance evaluated on the null hypothesis $H_0$ obtained permuting labels 100 times. (Bottom) Fraction of genuine responses is reported for each prompt, also encoded in the color channel.
  • Figure 4: Evolution of the label propagator F1-score with the increase of the training set size from 5 to 100 responses (step 5). Results are in terms of mean ($x$-axis) and standard deviation ($y$-axis): the lower and righter, the better (cf. Figure \ref{['fig:LP_performances-acc']}).
  • Figure 5: Sensitivity analysis of the $\lambda$ regularisation parameter for each of the models. Best choice is reported as a dashed black line.
  • ...and 4 more figures