A Geometric Analysis of Small-sized Language Model Hallucinations
Emanuele Ricco, Elia Onofri, Lorenzo Cima, Stefano Cresci, Roberto Di Pietro
TL;DR
Small-sized LLMs can hallucinate even when knowledge is present. The authors propose a geometric view by embedding repeated responses per prompt into a fixed $384$-dimensional space and analyzing distributional structure with Wasserstein distances and Fisher Discriminant Analysis, followed by a geometry-aware label propagation that scales limited annotations to large sets. They demonstrate robust, cross-model separability between genuine and hallucinated responses and show that as few as $30$–$60$ labeled examples suffice to reach high accuracy ($F1>90\%$) across ten diverse models, using a single discriminative direction. A fully labeled dataset (150 generations for 200 prompts across 10 small LLMs) is released to support this line of analysis, highlighting the practical utility of embedding-space geometry for scalable hallucination detection and retrieval-aware evaluation. Overall, the work reframes hallucinations as a geometric phenomenon in response distributions, complementary to knowledge-centric and single-response evaluation paradigms.
Abstract
Hallucinations -- fluent but factually incorrect responses -- pose a major challenge to the reliability of language models, especially in multi-step or agentic settings. This work investigates hallucinations in small-sized LLMs through a geometric perspective, starting from the hypothesis that when models generate multiple responses to the same prompt, genuine ones exhibit tighter clustering in the embedding space, we prove this hypothesis and, leveraging this geometrical insight, we also show that it is possible to achieve a consistent level of separability. This latter result is used to introduce a label-efficient propagation method that classifies large collections of responses from just 30-50 annotations, achieving F1 scores above 90%. Our findings, framing hallucinations from a geometric perspective in the embedding space, complement traditional knowledge-centric and single-response evaluation paradigms, paving the way for further research.
