The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

Yuji Zhang; Sha Li; Cheng Qian; Jiateng Liu; Pengfei Yu; Chi Han; Yi R. Fung; Kathleen McKeown; Chengxiang Zhai; Manling Li; Heng Ji

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

Yuji Zhang, Sha Li, Cheng Qian, Jiateng Liu, Pengfei Yu, Chi Han, Yi R. Fung, Kathleen McKeown, Chengxiang Zhai, Manling Li, Heng Ji

TL;DR

The paper tackles the stubborn problem of factual hallucinations in LLMs by proposing knowledge overshadowing as a core mechanism where dominant knowledge suppresses less frequent facts during generation. It establishes a log-linear law showing that hallucination rates scale with the logarithms of relative knowledge popularity $P$, knowledge length $L$, and model size $S$, enabling proactive quantification of expected hallucinations before training or inference. Building on this insight, the authors derive a generalization-bound interpretation and introduce CoDA, a training-free decoding strategy that amplifies overshadowed knowledge via contrastive decoding and relative PMI metrics, achieving substantial factuality gains on multiple benchmarks. The work also validates the law across pretrained and finetuned models and discusses applicability to state-of-the-art LLMs, offering a principled framework for predicting, mitigating, and controlling hallucinations in practice. Overall, the results provide actionable methods for more predictable and reliable language models, with implications for safer deployment and data-centric model development.

Abstract

Hallucination is a persistent challenge in large language models (LLMs), where even with rigorous quality control, models often generate distorted facts. This paradox, in which error generation continues despite high-quality training data, calls for a deeper understanding of the underlying LLM mechanisms. To address it, we propose a novel concept: knowledge overshadowing, where model's dominant knowledge can obscure less prominent knowledge during text generation, causing the model to fabricate inaccurate details. Building on this idea, we introduce a novel framework to quantify factual hallucinations by modeling knowledge overshadowing. Central to our approach is the log-linear law, which predicts that the rate of factual hallucination increases linearly with the logarithmic scale of (1) Knowledge Popularity, (2) Knowledge Length, and (3) Model Size. The law provides a means to preemptively quantify hallucinations, offering foresight into their occurrence even before model training or inference. Built on overshadowing effect, we propose a new decoding strategy CoDa, to mitigate hallucinations, which notably enhance model factuality on Overshadow (27.9%), MemoTrap (13.1%) and NQ-Swap (18.3%). Our findings not only deepen understandings of the underlying mechanisms behind hallucinations but also provide actionable insights for developing more predictable and controllable language models.

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

TL;DR

Abstract

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)