Entity-level Factual Adaptiveness of Fine-tuning based Abstractive Summarization Models
Jongyoon Song, Nohil Park, Bongkyu Hwang, Jaewoong Yun, Seongho Joe, Youngjune L. Gwon, Sungroh Yoon
TL;DR
This paper defines factual adaptiveness as robustness to entity-level knowledge conflict in fine-tuning based abstractive summarization and introduces two metrics, $M_{CL}$ and $M_{FC}$, to quantify it. It then proposes a controllable counterfactual data augmentation framework that uses parametric knowledge from pretrained language models to generate and utilize counterfactual entity replacements, with configurable augmentation ratio $\rho$ and candidate-group strategies. Across PEGASUS and BART on XSum and CNN/DailyMail, the method substantially improves factual adaptiveness while largely preserving factual consistency on original data, illustrating an orthogonal relationship between the two notions. Qualitative analyses show reduced entity-level hallucinations and demonstrate how augmentation group choices influence generalization. Overall, the work provides a practical approach to diagnosing and mitigating knowledge-conflict hallucinations in abstractive summarization, with potential for integration into contrastive learning pipelines and broader knowledge-conflict settings.
Abstract
Abstractive summarization models often generate factually inconsistent content particularly when the parametric knowledge of the model conflicts with the knowledge in the input document. In this paper, we analyze the robustness of fine-tuning based summarization models to the knowledge conflict, which we call factual adaptiveness. We utilize pre-trained language models to construct evaluation sets and find that factual adaptiveness is not strongly correlated with factual consistency on original datasets. Furthermore, we introduce a controllable counterfactual data augmentation method where the degree of knowledge conflict within the augmented data can be adjustable. Our experimental results on two pre-trained language models (PEGASUS and BART) and two fine-tuning datasets (XSum and CNN/DailyMail) demonstrate that our method enhances factual adaptiveness while achieving factual consistency on original datasets on par with the contrastive learning baseline.
