HalluCounter: Reference-free LLM Hallucination Detection in the Wild!
Ashok Urlana, Gopichand Kanumolu, Charaka Vinayak Kumar, Bala Mallikarjunarao Garlapati, Rahul Mishra
TL;DR
HalluCounter addresses the challenge of reference-free hallucination detection in LLMs by incorporating both query-response and response-response consistency to train a robust detector that outputs a hallucination label, a confidence score, and an optimal response. It introduces HalluCounterEval, a large, multi-domain benchmark with synthetic and human-annotated data from Jeopardy and Kaggle, enabling thorough evaluation across domains. The methodology combines DeBERTa-v3-large cross-encoder-based NLI feature extraction with both statistical and BERT classifiers, demonstrating strong improvements over state-of-the-art RFHD methods and providing detailed ablations, human evaluation, and error analyses. The work has practical impact by enabling safer LLM usage in real-world settings and offering a scalable dataset to spur further RFHD research, while acknowledging limitations and ethical considerations in dataset construction and evaluation.
Abstract
Response consistency-based, reference-free hallucination detection (RFHD) methods do not depend on internal model states, such as generation probabilities or gradients, which Grey-box models typically rely on but are inaccessible in closed-source LLMs. However, their inability to capture query-response alignment patterns often results in lower detection accuracy. Additionally, the lack of large-scale benchmark datasets spanning diverse domains remains a challenge, as most existing datasets are limited in size and scope. To this end, we propose HalluCounter, a novel reference-free hallucination detection method that utilizes both response-response and query-response consistency and alignment patterns. This enables the training of a classifier that detects hallucinations and provides a confidence score and an optimal response for user queries. Furthermore, we introduce HalluCounterEval, a benchmark dataset comprising both synthetically generated and human-curated samples across multiple domains. Our method outperforms state-of-the-art approaches by a significant margin, achieving over 90\% average confidence in hallucination detection across datasets.
