Table of Contents
Fetching ...

InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers

Yakir Yehuda, Itzik Malkiel, Oren Barkan, Jonathan Weill, Royi Ronen, Noam Koenigstein

TL;DR

The paper tackles the challenge of hallucinations in LLM outputs by introducing InterrogateLLM, a backward-interrogation approach that detects inconsistencies between the original query and queries reconstructed from the model's answer. By evaluating a forward prompt that generates an answer and a backward process that reconstructs the query set, then measuring embedding-based similarity against a threshold, the method remains external-knowledge-free and adaptable to few-shot settings. Across Movies, Books, and GCI tasks, with GPT-3 and Llama-2 variants, InterrogateLLM (especially with ensembles and higher $K$) outperforms baselines like SelfCheckGPT, SBERT/ADA cosine detectors, and ablations show the value of multiple backward passes and temperature variation. The findings suggest a scalable, prompt-based mechanism to improve LLM reliability in real-world applications, with future work extending to retrieval-augmented generation and addressing limitations such as many-to-one mappings and semi-truth detection.

Abstract

Despite the many advances of Large Language Models (LLMs) and their unprecedented rapid evolution, their impact and integration into every facet of our daily lives is limited due to various reasons. One critical factor hindering their widespread adoption is the occurrence of hallucinations, where LLMs invent answers that sound realistic, yet drift away from factual truth. In this paper, we present a novel method for detecting hallucinations in large language models, which tackles a critical issue in the adoption of these models in various real-world scenarios. Through extensive evaluations across multiple datasets and LLMs, including Llama-2, we study the hallucination levels of various recent LLMs and demonstrate the effectiveness of our method to automatically detect them. Notably, we observe up to 87% hallucinations for Llama-2 in a specific experiment, where our method achieves a Balanced Accuracy of 81%, all without relying on external knowledge.

InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers

TL;DR

The paper tackles the challenge of hallucinations in LLM outputs by introducing InterrogateLLM, a backward-interrogation approach that detects inconsistencies between the original query and queries reconstructed from the model's answer. By evaluating a forward prompt that generates an answer and a backward process that reconstructs the query set, then measuring embedding-based similarity against a threshold, the method remains external-knowledge-free and adaptable to few-shot settings. Across Movies, Books, and GCI tasks, with GPT-3 and Llama-2 variants, InterrogateLLM (especially with ensembles and higher ) outperforms baselines like SelfCheckGPT, SBERT/ADA cosine detectors, and ablations show the value of multiple backward passes and temperature variation. The findings suggest a scalable, prompt-based mechanism to improve LLM reliability in real-world applications, with future work extending to retrieval-augmented generation and addressing limitations such as many-to-one mappings and semi-truth detection.

Abstract

Despite the many advances of Large Language Models (LLMs) and their unprecedented rapid evolution, their impact and integration into every facet of our daily lives is limited due to various reasons. One critical factor hindering their widespread adoption is the occurrence of hallucinations, where LLMs invent answers that sound realistic, yet drift away from factual truth. In this paper, we present a novel method for detecting hallucinations in large language models, which tackles a critical issue in the adoption of these models in various real-world scenarios. Through extensive evaluations across multiple datasets and LLMs, including Llama-2, we study the hallucination levels of various recent LLMs and demonstrate the effectiveness of our method to automatically detect them. Notably, we observe up to 87% hallucinations for Llama-2 in a specific experiment, where our method achieves a Balanced Accuracy of 81%, all without relying on external knowledge.
Paper Structure (27 sections, 10 equations, 2 figures, 10 tables, 1 algorithm)

This paper contains 27 sections, 10 equations, 2 figures, 10 tables, 1 algorithm.

Figures (2)

  • Figure 1: An illustration of the InterrogateLLM method. (1) A few-shot prompt and a query are fed into $F_{LLM}$, which generates an answer. (2) The shots in the prompt are then reversed, forming a sequence of answer-question pairs, with the generated answer constructed on top. The $B_{LLM}$ is then used to generate $K$ queries that correspond to the generated answer. Ideally, the generated queries should recover the original query from the forward phase. (3) The set of recovered questions is then embedded by a language model and compared with the original question, producing a final score that determines whether the generated answer suffers from hallucination.
  • Figure 2: The average AUC and B-Acc scores across Movies, Books, and GCI datasets, per different K values (1-5).