Large Language Models are Skeptics: False Negative Problem of Input-conflicting Hallucination
Jongyoon Song, Sangwon Yu, Sungroh Yoon
TL;DR
The paper investigates a bias in large language models where context-based factuality judgments disproportionately yield false negatives, creating input-conflicting hallucinations. It introduces All-True and All-False prompts to isolate this bias and conducts experiments on StrategyQA and BoolQ with Mistral, ChatGPT, and GPT-4, revealing a robust tendency to deny true statements and overconfident incorrect responses. The study finds that the false negative problem is largely independent of model size and is influenced by whether the target answer is True, with higher confidence for incorrect False outcomes in All-True prompts. Context and query rewriting partially mitigates the problem, offering a practical direction for improving reliability in context-grounded reasoning, though model-specific reactions (notably GPT-4's null responses in some rewriting scenarios) warrant further investigation.
Abstract
In this paper, we identify a new category of bias that induces input-conflicting hallucinations, where large language models (LLMs) generate responses inconsistent with the content of the input context. This issue we have termed the false negative problem refers to the phenomenon where LLMs are predisposed to return negative judgments when assessing the correctness of a statement given the context. In experiments involving pairs of statements that contain the same information but have contradictory factual directions, we observe that LLMs exhibit a bias toward false negatives. Specifically, the model presents greater overconfidence when responding with False. Furthermore, we analyze the relationship between the false negative problem and context and query rewriting and observe that both effectively tackle false negatives in LLMs.
