Table of Contents
Fetching ...

DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Aryo Pradipta Gema, Chen Jin, Ahmed Abdulaal, Tom Diethe, Philip Teare, Beatrice Alex, Pasquale Minervini, Amrutha Saseendran

TL;DR

DeCoRe mitigates potentially hallucinated responses by dynamically contrasting the outputs of the base LLM and the masked LLM, using conditional entropy as a guide, and significantly improves performance on tasks requiring high contextual faithfulness.

Abstract

Large Language Models (LLMs) often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context or incorrectly recalling internal knowledge. Recent studies have identified specific attention heads within the Transformer architecture, known as retrieval heads, responsible for extracting relevant contextual information. We hypothesise that masking these retrieval heads can induce hallucinations and that contrasting the outputs of the base LLM and the masked LLM can reduce hallucinations. To this end, we propose Decoding by Contrasting Retrieval Heads (DeCoRe), a novel training-free decoding strategy that amplifies information found in the context and model parameters. DeCoRe mitigates potentially hallucinated responses by dynamically contrasting the outputs of the base LLM and the masked LLM, using conditional entropy as a guide. Our extensive experiments confirm that DeCoRe significantly improves performance on tasks requiring high contextual faithfulness, such as summarisation (XSum by 18.6%), instruction following (MemoTrap by 10.9%), and open-book question answering (NQ-Open by 2.4% and NQ-Swap by 5.5%).

DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

TL;DR

DeCoRe mitigates potentially hallucinated responses by dynamically contrasting the outputs of the base LLM and the masked LLM, using conditional entropy as a guide, and significantly improves performance on tasks requiring high contextual faithfulness.

Abstract

Large Language Models (LLMs) often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context or incorrectly recalling internal knowledge. Recent studies have identified specific attention heads within the Transformer architecture, known as retrieval heads, responsible for extracting relevant contextual information. We hypothesise that masking these retrieval heads can induce hallucinations and that contrasting the outputs of the base LLM and the masked LLM can reduce hallucinations. To this end, we propose Decoding by Contrasting Retrieval Heads (DeCoRe), a novel training-free decoding strategy that amplifies information found in the context and model parameters. DeCoRe mitigates potentially hallucinated responses by dynamically contrasting the outputs of the base LLM and the masked LLM, using conditional entropy as a guide. Our extensive experiments confirm that DeCoRe significantly improves performance on tasks requiring high contextual faithfulness, such as summarisation (XSum by 18.6%), instruction following (MemoTrap by 10.9%), and open-book question answering (NQ-Open by 2.4% and NQ-Swap by 5.5%).

Paper Structure

This paper contains 64 sections, 9 equations, 12 figures, 28 tables.

Figures (12)

  • Figure 1: Overview of the DeCoRe workflow. Given the same input, the base LLM ($\text{LLM}_{\text{base}}$) and the variant with masked retrieval heads ($\text{LLM}_{\text{masked}}$) predict the next token. An uncertainty estimation is applied to the base model's output using conditional entropy: higher conditional entropy increases the contrastive factor ($\alpha$), penalising predictions that align with the $\text{LLM}_{\text{masked}}$. The final prediction is selected based on weighted contrastive decoding of the outputs from both models, leading to a more grounded response.
  • Figure 2: Example of hallucination induced by masking retrieval heads in the NQ-Swap task. The base model retrieves the correct answer from the substituted context, while the masked model generates an incorrect answer.
  • Figure 3: Correlation between the number of masked retrieval heads and performance of Llama3-8B-Instruct with DeCoRe$_\text{entropy}$ on each task. The correlations are quantified by the Pearson Correlation Coefficient $r$ for each plot. Detailed results are listed in Table \ref{['tab:result_ablation_num_masked_retr_head_faithfulness']} and Table \ref{['tab:result_ablation_num_masked_retr_head_factuality']}.
  • Figure 4: Comparison of Length-normalised conditional entropy of Greedy, ITI, DoLa, and DeCoRe$_\text{entropy}$ in long-generation tasks ( i.e., XSum (a), MuSiQue (Closed) + CoT (b), and MuSiQue (Open) + CoT (c)). Asterisks (*) indicate statistically significant differences between the distributions based on one-tailed Welch’s t-test results. Detailed results are listed in Table \ref{['tab:result_entropy_sum']}.
  • Figure 5: Retrieval scores of the Retrieval Heads with non-zero retrieval scores.
  • ...and 7 more figures