Improving Factual Error Correction for Abstractive Summarization via Data Distillation and Conditional-generation Cloze
Yiyang Li, Lei Li, Dingxin Hu, Xueyi Hao, Marina Litvak, Natalia Vanetik, Yanquan Zhou
TL;DR
The paper tackles the challenge of factual inconsistency in abstractive summarization by introducing FactCloze, a cloze-based factual error correction model, and SummDSC, a multi-dimensionally filtered, distilled faithful dataset. FactCloze uses autoregressive cloze modeling (with BART and T5) to fill in masked factual factors, enabling corrections that preserve context and causality, while a self-diagnosis and post-alert mechanism mitigate over-correction and risky corrections. SummDSC improves training data fidelity through multi-metric filtering (DAE, SummaC, ClozE) and creates an alert subset to handle high-risk cases, potentially improving generalization. Across FRANK benchmarks and BART-based summarization, the combination of FactCloze and SummDSC achieves superior factual consistency metrics, with human evaluation supporting improvements in correctness and a robust post-alert strategy. The work offers practical pathways for deploying faithfulness-aware post-editing in real-world summarization systems, supported by released code and models.
Abstract
Improving factual consistency in abstractive summarization has been a focus of current research. One promising approach is the post-editing method. However, previous works have yet to make sufficient use of factual factors in summaries and suffers from the negative effect of the training datasets. In this paper, we first propose a novel factual error correction model FactCloze based on a conditional-generation cloze task. FactCloze can construct the causality among factual factors while being able to determine whether the blank can be answered or not. Then, we propose a data distillation method to generate a more faithful summarization dataset SummDSC via multiple-dimensional evaluation. We experimentally validate the effectiveness of our approach, which leads to an improvement in multiple factual consistency metrics compared to baselines.
