InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers

Yakir Yehuda; Itzik Malkiel; Oren Barkan; Jonathan Weill; Royi Ronen; Noam Koenigstein

InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers

Yakir Yehuda, Itzik Malkiel, Oren Barkan, Jonathan Weill, Royi Ronen, Noam Koenigstein

TL;DR

The paper tackles the challenge of hallucinations in LLM outputs by introducing InterrogateLLM, a backward-interrogation approach that detects inconsistencies between the original query and queries reconstructed from the model's answer. By evaluating a forward prompt that generates an answer and a backward process that reconstructs the query set, then measuring embedding-based similarity against a threshold, the method remains external-knowledge-free and adaptable to few-shot settings. Across Movies, Books, and GCI tasks, with GPT-3 and Llama-2 variants, InterrogateLLM (especially with ensembles and higher $K$) outperforms baselines like SelfCheckGPT, SBERT/ADA cosine detectors, and ablations show the value of multiple backward passes and temperature variation. The findings suggest a scalable, prompt-based mechanism to improve LLM reliability in real-world applications, with future work extending to retrieval-augmented generation and addressing limitations such as many-to-one mappings and semi-truth detection.

Abstract

Despite the many advances of Large Language Models (LLMs) and their unprecedented rapid evolution, their impact and integration into every facet of our daily lives is limited due to various reasons. One critical factor hindering their widespread adoption is the occurrence of hallucinations, where LLMs invent answers that sound realistic, yet drift away from factual truth. In this paper, we present a novel method for detecting hallucinations in large language models, which tackles a critical issue in the adoption of these models in various real-world scenarios. Through extensive evaluations across multiple datasets and LLMs, including Llama-2, we study the hallucination levels of various recent LLMs and demonstrate the effectiveness of our method to automatically detect them. Notably, we observe up to 87% hallucinations for Llama-2 in a specific experiment, where our method achieves a Balanced Accuracy of 81%, all without relying on external knowledge.

InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers

TL;DR

) outperforms baselines like SelfCheckGPT, SBERT/ADA cosine detectors, and ablations show the value of multiple backward passes and temperature variation. The findings suggest a scalable, prompt-based mechanism to improve LLM reliability in real-world applications, with future work extending to retrieval-augmented generation and addressing limitations such as many-to-one mappings and semi-truth detection.

Abstract

Paper Structure (27 sections, 10 equations, 2 figures, 10 tables, 1 algorithm)

This paper contains 27 sections, 10 equations, 2 figures, 10 tables, 1 algorithm.

Introduction
Related Work
Problem setup
The InterrogateLLM method
Variable temperatures
Experiments
Datasets and Tasks
The Movies Dataset
Books Dataset
Global Country Information (GCI)
Implementation details
Baselines
The hallucination rates
Hallucination detection results
Ablation and hyper-parameter analysis
...and 12 more sections

Figures (2)

Figure 1: An illustration of the InterrogateLLM method. (1) A few-shot prompt and a query are fed into $F_{LLM}$, which generates an answer. (2) The shots in the prompt are then reversed, forming a sequence of answer-question pairs, with the generated answer constructed on top. The $B_{LLM}$ is then used to generate $K$ queries that correspond to the generated answer. Ideally, the generated queries should recover the original query from the forward phase. (3) The set of recovered questions is then embedded by a language model and compared with the original question, producing a final score that determines whether the generated answer suffers from hallucination.
Figure 2: The average AUC and B-Acc scores across Movies, Books, and GCI datasets, per different K values (1-5).

InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers

TL;DR

Abstract

InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers

Authors

TL;DR

Abstract

Table of Contents

Figures (2)