Table of Contents
Fetching ...

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

Yuji Zhang, Sha Li, Cheng Qian, Jiateng Liu, Pengfei Yu, Chi Han, Yi R. Fung, Kathleen McKeown, Chengxiang Zhai, Manling Li, Heng Ji

TL;DR

The paper tackles the stubborn problem of factual hallucinations in LLMs by proposing knowledge overshadowing as a core mechanism where dominant knowledge suppresses less frequent facts during generation. It establishes a log-linear law showing that hallucination rates scale with the logarithms of relative knowledge popularity $P$, knowledge length $L$, and model size $S$, enabling proactive quantification of expected hallucinations before training or inference. Building on this insight, the authors derive a generalization-bound interpretation and introduce CoDA, a training-free decoding strategy that amplifies overshadowed knowledge via contrastive decoding and relative PMI metrics, achieving substantial factuality gains on multiple benchmarks. The work also validates the law across pretrained and finetuned models and discusses applicability to state-of-the-art LLMs, offering a principled framework for predicting, mitigating, and controlling hallucinations in practice. Overall, the results provide actionable methods for more predictable and reliable language models, with implications for safer deployment and data-centric model development.

Abstract

Hallucination is a persistent challenge in large language models (LLMs), where even with rigorous quality control, models often generate distorted facts. This paradox, in which error generation continues despite high-quality training data, calls for a deeper understanding of the underlying LLM mechanisms. To address it, we propose a novel concept: knowledge overshadowing, where model's dominant knowledge can obscure less prominent knowledge during text generation, causing the model to fabricate inaccurate details. Building on this idea, we introduce a novel framework to quantify factual hallucinations by modeling knowledge overshadowing. Central to our approach is the log-linear law, which predicts that the rate of factual hallucination increases linearly with the logarithmic scale of (1) Knowledge Popularity, (2) Knowledge Length, and (3) Model Size. The law provides a means to preemptively quantify hallucinations, offering foresight into their occurrence even before model training or inference. Built on overshadowing effect, we propose a new decoding strategy CoDa, to mitigate hallucinations, which notably enhance model factuality on Overshadow (27.9%), MemoTrap (13.1%) and NQ-Swap (18.3%). Our findings not only deepen understandings of the underlying mechanisms behind hallucinations but also provide actionable insights for developing more predictable and controllable language models.

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

TL;DR

The paper tackles the stubborn problem of factual hallucinations in LLMs by proposing knowledge overshadowing as a core mechanism where dominant knowledge suppresses less frequent facts during generation. It establishes a log-linear law showing that hallucination rates scale with the logarithms of relative knowledge popularity , knowledge length , and model size , enabling proactive quantification of expected hallucinations before training or inference. Building on this insight, the authors derive a generalization-bound interpretation and introduce CoDA, a training-free decoding strategy that amplifies overshadowed knowledge via contrastive decoding and relative PMI metrics, achieving substantial factuality gains on multiple benchmarks. The work also validates the law across pretrained and finetuned models and discusses applicability to state-of-the-art LLMs, offering a principled framework for predicting, mitigating, and controlling hallucinations in practice. Overall, the results provide actionable methods for more predictable and reliable language models, with implications for safer deployment and data-centric model development.

Abstract

Hallucination is a persistent challenge in large language models (LLMs), where even with rigorous quality control, models often generate distorted facts. This paradox, in which error generation continues despite high-quality training data, calls for a deeper understanding of the underlying LLM mechanisms. To address it, we propose a novel concept: knowledge overshadowing, where model's dominant knowledge can obscure less prominent knowledge during text generation, causing the model to fabricate inaccurate details. Building on this idea, we introduce a novel framework to quantify factual hallucinations by modeling knowledge overshadowing. Central to our approach is the log-linear law, which predicts that the rate of factual hallucination increases linearly with the logarithmic scale of (1) Knowledge Popularity, (2) Knowledge Length, and (3) Model Size. The law provides a means to preemptively quantify hallucinations, offering foresight into their occurrence even before model training or inference. Built on overshadowing effect, we propose a new decoding strategy CoDa, to mitigate hallucinations, which notably enhance model factuality on Overshadow (27.9%), MemoTrap (13.1%) and NQ-Swap (18.3%). Our findings not only deepen understandings of the underlying mechanisms behind hallucinations but also provide actionable insights for developing more predictable and controllable language models.

Paper Structure

This paper contains 50 sections, 26 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Knowledge overshadowing leads to hallucinations, which exarcerbates with growing relative knowledge popularity ($\text{P}$), length ($\text{L}$), and model size ($\text{S})$.
  • Figure 2: LLMs are pretrained from scratch on a synthetic dataset with controlled variables of $\text{S}$, $\text{P}$, and $\text{L}$. In each subfigre, we experiment by varying one variable at a time while keeping the other two constants. LLMs are trained auto-regressively with cross-entropy loss computed over entire sentences. Details on training data statistics, training parameters, and implementations are elaborated in \ref{['ssec:implementation']}, \ref{['ssec: overshadowing_dataset']}.
  • Figure 3: Fine-tuning open-source LLMs on natural language tasks. Regression lines represent the predicted trends derived from LLMs pretrained on synthetic data in § \ref{['ssec:pretrain_law']}. The red cross markers indicate the empirically observed hallucination rates in fine-tuned LLMs. Training data statistics and implementation are in \ref{['ssec:implementation']}, \ref{['ssec: overshadowing_dataset']}.
  • Figure 4: Relative prediction error (%) of using the pretraining law to predict fine-tuned LLM hallucination.
  • Figure 5: Quantitative analysis on the effects of two influencing factors P, $\text{L}$ on our method CoDA performance on eliminating knowledge overshadowing.