Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models

Yuji Zhang; Sha Li; Jiateng Liu; Pengfei Yu; Yi R. Fung; Jing Li; Manling Li; Heng Ji

Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models

Yuji Zhang, Sha Li, Jiateng Liu, Pengfei Yu, Yi R. Fung, Jing Li, Manling Li, Heng Ji

TL;DR

This work identifies knowledge overshadowing as a distinct form of amalgamated hallucination in autoregressive language models when prompts contain multiple conditions. It demonstrates that data imbalance in training and prompting drives this effect and provides a generalization-bound perspective linking overshadowing to over-generalization of dominant patterns. The authors propose inference-time guardrails, including a PMI-based overshadowing detector with an Escaping Penalty Mechanism and a Self-Contrastive Decoding scheme, achieving notable reductions in hallucination (up to $11.2\%$ to $39.4\%$) and strong anticipation performance (up to $82\%$ F1). These findings offer practical, training-free tools to improve reliability and fairness of generation across model families and tasks.

Abstract

Hallucination is often regarded as a major impediment for using large language models (LLMs), especially for knowledge-intensive tasks. Even when the training corpus consists solely of true statements, language models still generate hallucinations in the form of amalgamations of multiple facts. We coin this phenomenon as ``knowledge overshadowing'': when we query knowledge from a language model with multiple conditions, some conditions overshadow others, leading to hallucinated outputs. This phenomenon partially stems from training data imbalance, which we verify on both pretrained models and fine-tuned models, over a wide range of LM model families and sizes.From a theoretical point of view, knowledge overshadowing can be interpreted as over-generalization of the dominant conditions (patterns). We show that the hallucination rate grows with both the imbalance ratio (between the popular and unpopular condition) and the length of dominant condition description, consistent with our derived generalization bound. Finally, we propose to utilize overshadowing conditions as a signal to catch hallucination before it is produced, along with a training-free self-contrastive decoding method to alleviate hallucination during inference. Our proposed approach showcases up to 82% F1 for hallucination anticipation and 11.2% to 39.4% hallucination control, with different models and datasets.

Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models

TL;DR

) and strong anticipation performance (up to

F1). These findings offer practical, training-free tools to improve reliability and fairness of generation across model families and tasks.

Abstract

Paper Structure (27 sections, 16 equations, 3 figures, 6 tables)

This paper contains 27 sections, 16 equations, 3 figures, 6 tables.

Introduction
Knowledge Overshadowing in Pretrained Models
Data Imbalance Causes Knowledge Overshadowing
Natural Language Queries
Tasks.
Metric.
Tested Models.
Results.
Synthetic Queries
Knowledge Overshadowing as a Case of Over-Generalization
Generalization positively correlates with hallucination.
Generalization error bound of auto-regressive language modeling.
Guardrails for Hallucination
Training-free hallucination anticipation
Self Contrastive decoding for hallucination control
...and 12 more sections

Figures (3)

Figure 1: Knowledge Overshadowing causes hallucinations. We propose using overshadowing conditions as a signal to detect hallucination before it occurs, and alleviate hallucination during inference by proposing a training-free self-contrastive decoding method.
Figure 2: Hallucination rate (%) with varying prefix lenghts on varying model families.
Figure 3: Controllable variants affect hallucination rate (%) and GSNR (generalization).

Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models

TL;DR

Abstract

Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)