Table of Contents
Fetching ...

Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models

Hongbang Yuan, Pengfei Cao, Zhuoran Jin, Yubo Chen, Daojian Zeng, Kang Liu, Jun Zhao

TL;DR

FAITH (False premise Attention head constraining for miTigating Hallucinations), a novel and effective method to mitigate false premise hallucinations, is proposed, which constrains the false premise attention heads during the model inference process.

Abstract

Large Language Models (LLMs) have shown impressive capabilities but still suffer from the issue of hallucinations. A significant type of this issue is the false premise hallucination, which we define as the phenomenon when LLMs generate hallucinated text when confronted with false premise questions. In this paper, we perform a comprehensive analysis of the false premise hallucination and elucidate its internal working mechanism: a small subset of attention heads (which we designate as false premise heads) disturb the knowledge extraction process, leading to the occurrence of false premise hallucination. Based on our analysis, we propose \textbf{FAITH} (\textbf{F}alse premise \textbf{A}ttention head constra\textbf{I}ining for mi\textbf{T}igating \textbf{H}allucinations), a novel and effective method to mitigate false premise hallucinations. It constrains the false premise attention heads during the model inference process. Impressively, extensive experiments demonstrate that constraining only approximately $1\%$ of the attention heads in the model yields a notable increase of nearly $20\%$ of model performance.

Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models

TL;DR

FAITH (False premise Attention head constraining for miTigating Hallucinations), a novel and effective method to mitigate false premise hallucinations, is proposed, which constrains the false premise attention heads during the model inference process.

Abstract

Large Language Models (LLMs) have shown impressive capabilities but still suffer from the issue of hallucinations. A significant type of this issue is the false premise hallucination, which we define as the phenomenon when LLMs generate hallucinated text when confronted with false premise questions. In this paper, we perform a comprehensive analysis of the false premise hallucination and elucidate its internal working mechanism: a small subset of attention heads (which we designate as false premise heads) disturb the knowledge extraction process, leading to the occurrence of false premise hallucination. Based on our analysis, we propose \textbf{FAITH} (\textbf{F}alse premise \textbf{A}ttention head constra\textbf{I}ining for mi\textbf{T}igating \textbf{H}allucinations), a novel and effective method to mitigate false premise hallucinations. It constrains the false premise attention heads during the model inference process. Impressively, extensive experiments demonstrate that constraining only approximately of the attention heads in the model yields a notable increase of nearly of model performance.
Paper Structure (30 sections, 6 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 30 sections, 6 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Illustration of the false premise hallucination. The question contains the false premise that "Albert Einstein was awarded The Nobel Prize of Physics in 1920" whereas in fact he was awarded the prize in 1921. We find that the presence of false premise attention heads contributes to the hallucinated response. Our method can effectively mitigate the false premise hallucination.
  • Figure 2: The Receiver Operating Characteristic Curve on the Movie dataset. The perfect AUC score is 1 while the random AUC score is 0.5.
  • Figure 3: Information flow from various parts of the question to the final logit across distinct layers on hallucinated and non-hallucinated samples.
  • Figure 4: Calculation of the influence of a single attention head.
  • Figure 5: Illustration of the false premise attention heads.