Table of Contents
Fetching ...

HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification

Bibek Paudel, Alexander Lyzhov, Preetam Joshi, Puneet Anand

TL;DR

The paper tackles enterprise-grade hallucinations in LLM outputs by introducing a four-category taxonomy (context-based, common knowledge, enterprise-knowledge, innocuous) and a multi-task HDM-2 system that separately verifies context grounding and common-knowledge consistency. HDM-2 outputs a context-based score $h_s(c,r)$ and token-level scores $\mathbf{h}_w(c,r)$, enabling fine-grained, span-level detection, with an aggregation function $f$ to identify problematic segments. A new dataset, HDMBench (~50,000 contextual documents), evaluates both context-based and common-knowledge hallucinations, and experiments show HDM-2 achieving state-of-the-art results on RagTruth, TruthfulQA, and HDMBench with a compact 3B-parameter backbone. The work provides practical, low-latency, and explainable detection suitable for enterprise deployment, with code, weights, and the dataset publicly available for adoption and further research.

Abstract

This paper introduces a comprehensive system for detecting hallucinations in large language model (LLM) outputs in enterprise settings. We present a novel taxonomy of LLM responses specific to hallucination in enterprise applications, categorizing them into context-based, common knowledge, enterprise-specific, and innocuous statements. Our hallucination detection model HDM-2 validates LLM responses with respect to both context and generally known facts (common knowledge). It provides both hallucination scores and word-level annotations, enabling precise identification of problematic content. To evaluate it on context-based and common-knowledge hallucinations, we introduce a new dataset HDMBench. Experimental results demonstrate that HDM-2 out-performs existing approaches across RagTruth, TruthfulQA, and HDMBench datasets. This work addresses the specific challenges of enterprise deployment, including computational efficiency, domain specialization, and fine-grained error identification. Our evaluation dataset, model weights, and inference code are publicly available.

HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification

TL;DR

The paper tackles enterprise-grade hallucinations in LLM outputs by introducing a four-category taxonomy (context-based, common knowledge, enterprise-knowledge, innocuous) and a multi-task HDM-2 system that separately verifies context grounding and common-knowledge consistency. HDM-2 outputs a context-based score and token-level scores , enabling fine-grained, span-level detection, with an aggregation function to identify problematic segments. A new dataset, HDMBench (~50,000 contextual documents), evaluates both context-based and common-knowledge hallucinations, and experiments show HDM-2 achieving state-of-the-art results on RagTruth, TruthfulQA, and HDMBench with a compact 3B-parameter backbone. The work provides practical, low-latency, and explainable detection suitable for enterprise deployment, with code, weights, and the dataset publicly available for adoption and further research.

Abstract

This paper introduces a comprehensive system for detecting hallucinations in large language model (LLM) outputs in enterprise settings. We present a novel taxonomy of LLM responses specific to hallucination in enterprise applications, categorizing them into context-based, common knowledge, enterprise-specific, and innocuous statements. Our hallucination detection model HDM-2 validates LLM responses with respect to both context and generally known facts (common knowledge). It provides both hallucination scores and word-level annotations, enabling precise identification of problematic content. To evaluate it on context-based and common-knowledge hallucinations, we introduce a new dataset HDMBench. Experimental results demonstrate that HDM-2 out-performs existing approaches across RagTruth, TruthfulQA, and HDMBench datasets. This work addresses the specific challenges of enterprise deployment, including computational efficiency, domain specialization, and fine-grained error identification. Our evaluation dataset, model weights, and inference code are publicly available.

Paper Structure

This paper contains 17 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: An example interaction with an LLM, with our taxonomy categorizations. Best viewed in color.
  • Figure 2: Our proposed taxonomy of LLM response for enterprise settings, showing the four distinct categories and their detection approaches. This paper focuses on context-based and common knowledge hallucinations (shown in green).
  • Figure 3: Our system takes the query, optional context documents, and LLM response as its input and produces a judgement of the LLM response as Hallucination or Not a Hallucination. Shaded areas denote our models (round edges) and their outputs (sharp edges); and dashed lines denote optional components. Context documents could be obtained from an enterprise's existing RAG system. Our model can be extended with private enterprise-specific knowledge for the Common-Knowledge Check.
  • Figure 4: Common Knowledge hallucination detection (CK) performance (balanced accuracy on the validation set) for different intermediate layers of the backbone LLM (Qwen-2.5-3B-Instruct). The performance improves as we go from lower to higher layers, peaks at layer 25, and decreases as we go to the final layers.
  • Figure 5: Real-world examples of LLM response judgements by HDM-2 showing different types of knowledge and hallucinations. For simplicity, we do not show the word-level scores and apply a thresholded label for the entire sentence.