Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer
Xinshuo Hu, Baotian Hu, Dongfang Li, Xiaoguang Li, Lifeng Shang
TL;DR
The paper investigates how generative models maintain factual grounding when contextual knowledge changes, introducing a knowledge-augmented generator and a Margin Failure Rate ($MFR$) metric to quantify faithfulness during context transfer. It reveals that memory hallucination occurs across multiple architectures (e.g., FiD, BART, T5) and is influenced by factors such as the scale of contextual knowledge and the presence of noisy or negative contexts. By constructing a Debatepedia-based long-form QA benchmark and evaluating with $\text{BERT-SCORE}$-driven margin checks, the study highlights the challenges of grounding under dynamic knowledge and the need for robust evaluation and mitigation strategies. The work provides a framework and dataset for systematically studying context-driven hallucinations and points toward future directions for improving faithfulness in practical, knowledge-enabled NLP systems.
Abstract
The present study introduces the knowledge-augmented generator, which is specifically designed to produce information that remains grounded in contextual knowledge, regardless of alterations in the context. Previous research has predominantly focused on examining hallucinations stemming from static input, such as in the domains of summarization or machine translation. However, our investigation delves into the faithfulness of generative question answering in the presence of dynamic knowledge. Our objective is to explore the existence of hallucinations arising from parametric memory when contextual knowledge undergoes changes, while also analyzing the underlying causes for their occurrence. In order to efficiently address this issue, we propose a straightforward yet effective measure for detecting such hallucinations. Intriguingly, our investigation uncovers that all models exhibit a tendency to generate previous answers as hallucinations. To gain deeper insights into the underlying causes of this phenomenon, we conduct a series of experiments that verify the critical role played by context in hallucination, both during training and testing, from various perspectives.
