Table of Contents
Fetching ...

HalluCounter: Reference-free LLM Hallucination Detection in the Wild!

Ashok Urlana, Gopichand Kanumolu, Charaka Vinayak Kumar, Bala Mallikarjunarao Garlapati, Rahul Mishra

TL;DR

HalluCounter addresses the challenge of reference-free hallucination detection in LLMs by incorporating both query-response and response-response consistency to train a robust detector that outputs a hallucination label, a confidence score, and an optimal response. It introduces HalluCounterEval, a large, multi-domain benchmark with synthetic and human-annotated data from Jeopardy and Kaggle, enabling thorough evaluation across domains. The methodology combines DeBERTa-v3-large cross-encoder-based NLI feature extraction with both statistical and BERT classifiers, demonstrating strong improvements over state-of-the-art RFHD methods and providing detailed ablations, human evaluation, and error analyses. The work has practical impact by enabling safer LLM usage in real-world settings and offering a scalable dataset to spur further RFHD research, while acknowledging limitations and ethical considerations in dataset construction and evaluation.

Abstract

Response consistency-based, reference-free hallucination detection (RFHD) methods do not depend on internal model states, such as generation probabilities or gradients, which Grey-box models typically rely on but are inaccessible in closed-source LLMs. However, their inability to capture query-response alignment patterns often results in lower detection accuracy. Additionally, the lack of large-scale benchmark datasets spanning diverse domains remains a challenge, as most existing datasets are limited in size and scope. To this end, we propose HalluCounter, a novel reference-free hallucination detection method that utilizes both response-response and query-response consistency and alignment patterns. This enables the training of a classifier that detects hallucinations and provides a confidence score and an optimal response for user queries. Furthermore, we introduce HalluCounterEval, a benchmark dataset comprising both synthetically generated and human-curated samples across multiple domains. Our method outperforms state-of-the-art approaches by a significant margin, achieving over 90\% average confidence in hallucination detection across datasets.

HalluCounter: Reference-free LLM Hallucination Detection in the Wild!

TL;DR

HalluCounter addresses the challenge of reference-free hallucination detection in LLMs by incorporating both query-response and response-response consistency to train a robust detector that outputs a hallucination label, a confidence score, and an optimal response. It introduces HalluCounterEval, a large, multi-domain benchmark with synthetic and human-annotated data from Jeopardy and Kaggle, enabling thorough evaluation across domains. The methodology combines DeBERTa-v3-large cross-encoder-based NLI feature extraction with both statistical and BERT classifiers, demonstrating strong improvements over state-of-the-art RFHD methods and providing detailed ablations, human evaluation, and error analyses. The work has practical impact by enabling safer LLM usage in real-world settings and offering a scalable dataset to spur further RFHD research, while acknowledging limitations and ethical considerations in dataset construction and evaluation.

Abstract

Response consistency-based, reference-free hallucination detection (RFHD) methods do not depend on internal model states, such as generation probabilities or gradients, which Grey-box models typically rely on but are inaccessible in closed-source LLMs. However, their inability to capture query-response alignment patterns often results in lower detection accuracy. Additionally, the lack of large-scale benchmark datasets spanning diverse domains remains a challenge, as most existing datasets are limited in size and scope. To this end, we propose HalluCounter, a novel reference-free hallucination detection method that utilizes both response-response and query-response consistency and alignment patterns. This enables the training of a classifier that detects hallucinations and provides a confidence score and an optimal response for user queries. Furthermore, we introduce HalluCounterEval, a benchmark dataset comprising both synthetically generated and human-curated samples across multiple domains. Our method outperforms state-of-the-art approaches by a significant margin, achieving over 90\% average confidence in hallucination detection across datasets.

Paper Structure

This paper contains 49 sections, 4 equations, 3 figures, 31 tables.

Figures (3)

  • Figure 1: HalluCounter: A reference-free Hallucination Detection Pipeline for LLMs with three key components, 1) Extracting NLI features for query-response and response-response pairs, 2) A hallucination classifier that predicts hallucinations, and 3) Aggregating the final prediction, confidence score, and optimal response.
  • Figure 2: Hallucination rates across different sub-domains in various test sets of the Jeopardy and Kaggle datasets.
  • Figure 3: Number of unique responses generated by each LLM out of 10 responses for Jeopardy and Kaggle datasets. The lower the number represents the higher consistency.