Table of Contents
Fetching ...

HICD: Hallucination-Inducing via Attention Dispersion for Contrastive Decoding to Mitigate Hallucinations in Large Language Models

Xinyan Jiang, Hang Ye, Yongxin Zhu, Xiaoying Zheng, Zikang Chen, Jun Gong

TL;DR

This work tackles hallucinations in large language models by introducing HICD, a decoding framework that induces controlled hallucinations through attention dispersion on carefully selected inducing heads. The method constructs adversarial context–answer pairs to identify heads that drive correct versus incorrect outputs, then disperses attention across these inducing heads to create contrastive hallucinations. By contrastively decoding between the original and induced outputs, HICD achieves improved contextual faithfulness across tasks such as context completion, reading comprehension, and question answering, while also maintaining or improving factual recall metrics. The approach demonstrates strong performance across multiple model families (e.g., LLaMA, Qwen, Mistral) and datasets, offering a practical strategy to mitigate hallucinations with a balance between accuracy and efficiency.

Abstract

Large Language Models (LLMs) often generate hallucinations, producing outputs that are contextually inaccurate or factually incorrect. We introduce HICD, a novel method designed to induce hallucinations for contrastive decoding to mitigate hallucinations. Unlike existing contrastive decoding methods, HICD selects attention heads crucial to the model's prediction as inducing heads, then induces hallucinations by dispersing attention of these inducing heads and compares the hallucinated outputs with the original outputs to obtain the final result. Our approach significantly improves performance on tasks requiring contextual faithfulness, such as context completion, reading comprehension, and question answering. It also improves factuality in tasks requiring accurate knowledge recall. We demonstrate that our inducing heads selection and attention dispersion method leads to more "contrast-effective" hallucinations for contrastive decoding, outperforming other hallucination-inducing methods. Our findings provide a promising strategy for reducing hallucinations by inducing hallucinations in a controlled manner, enhancing the performance of LLMs in a wide range of tasks.

HICD: Hallucination-Inducing via Attention Dispersion for Contrastive Decoding to Mitigate Hallucinations in Large Language Models

TL;DR

This work tackles hallucinations in large language models by introducing HICD, a decoding framework that induces controlled hallucinations through attention dispersion on carefully selected inducing heads. The method constructs adversarial context–answer pairs to identify heads that drive correct versus incorrect outputs, then disperses attention across these inducing heads to create contrastive hallucinations. By contrastively decoding between the original and induced outputs, HICD achieves improved contextual faithfulness across tasks such as context completion, reading comprehension, and question answering, while also maintaining or improving factual recall metrics. The approach demonstrates strong performance across multiple model families (e.g., LLaMA, Qwen, Mistral) and datasets, offering a practical strategy to mitigate hallucinations with a balance between accuracy and efficiency.

Abstract

Large Language Models (LLMs) often generate hallucinations, producing outputs that are contextually inaccurate or factually incorrect. We introduce HICD, a novel method designed to induce hallucinations for contrastive decoding to mitigate hallucinations. Unlike existing contrastive decoding methods, HICD selects attention heads crucial to the model's prediction as inducing heads, then induces hallucinations by dispersing attention of these inducing heads and compares the hallucinated outputs with the original outputs to obtain the final result. Our approach significantly improves performance on tasks requiring contextual faithfulness, such as context completion, reading comprehension, and question answering. It also improves factuality in tasks requiring accurate knowledge recall. We demonstrate that our inducing heads selection and attention dispersion method leads to more "contrast-effective" hallucinations for contrastive decoding, outperforming other hallucination-inducing methods. Our findings provide a promising strategy for reducing hallucinations by inducing hallucinations in a controlled manner, enhancing the performance of LLMs in a wide range of tasks.

Paper Structure

This paper contains 39 sections, 15 equations, 9 figures, 17 tables.

Figures (9)

  • Figure 1: Illustration of Hallucination-Inducing Contrastive Decoding Method(HICD). The method include calculation of the importance scores and identification of the inducing heads (yellow), dispersing attention of inducing heads to induce hallucinations (pink) and applying contrastive decoding for hallucination mitigation (blue).
  • Figure 2: Effect of inducing head number on task performance. The red lines represent our HICD method, using average attention over inducing heads to induce hallucinations. The blue lines show the head-pruning method from prior research, where inducing heads are pruned (implementation details in Appendix \ref{['sec:head_pruning']}). The green dashed line represents the baseline model without hallucination induction. Spearman correlation coefficient $r$ measures the correlation between inducing heads and task performance. The parameter $\alpha$ and$\ s$ tuning are shown in Appendix \ref{['sec: B appendix']}.
  • Figure 3: Spearman correlation coefficients for inducing heads score ranking across different tasks. Higher correlation coefficients indicate that the inducing heads selected more similarly.
  • Figure 4: Visualization of the relationship between token confidence and the norm $f(x)$, where a subset of high-confidence tokens corresponds to higher $f(x)$.
  • Figure 5: Cosine similarity of the output norms $||f(x)||$ and $||\alpha f(x)||$(attention output) at different token positions under the methods: None, Cut Head, and Ave Head. Ave Head shows a higher similarity, allowing $||f(x)||$ to dominate the final attention values.
  • ...and 4 more figures