HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification
Bibek Paudel, Alexander Lyzhov, Preetam Joshi, Puneet Anand
TL;DR
The paper tackles enterprise-grade hallucinations in LLM outputs by introducing a four-category taxonomy (context-based, common knowledge, enterprise-knowledge, innocuous) and a multi-task HDM-2 system that separately verifies context grounding and common-knowledge consistency. HDM-2 outputs a context-based score $h_s(c,r)$ and token-level scores $\mathbf{h}_w(c,r)$, enabling fine-grained, span-level detection, with an aggregation function $f$ to identify problematic segments. A new dataset, HDMBench (~50,000 contextual documents), evaluates both context-based and common-knowledge hallucinations, and experiments show HDM-2 achieving state-of-the-art results on RagTruth, TruthfulQA, and HDMBench with a compact 3B-parameter backbone. The work provides practical, low-latency, and explainable detection suitable for enterprise deployment, with code, weights, and the dataset publicly available for adoption and further research.
Abstract
This paper introduces a comprehensive system for detecting hallucinations in large language model (LLM) outputs in enterprise settings. We present a novel taxonomy of LLM responses specific to hallucination in enterprise applications, categorizing them into context-based, common knowledge, enterprise-specific, and innocuous statements. Our hallucination detection model HDM-2 validates LLM responses with respect to both context and generally known facts (common knowledge). It provides both hallucination scores and word-level annotations, enabling precise identification of problematic content. To evaluate it on context-based and common-knowledge hallucinations, we introduce a new dataset HDMBench. Experimental results demonstrate that HDM-2 out-performs existing approaches across RagTruth, TruthfulQA, and HDMBench datasets. This work addresses the specific challenges of enterprise deployment, including computational efficiency, domain specialization, and fine-grained error identification. Our evaluation dataset, model weights, and inference code are publicly available.
