Table of Contents
Fetching ...

Improving Factual Error Correction for Abstractive Summarization via Data Distillation and Conditional-generation Cloze

Yiyang Li, Lei Li, Dingxin Hu, Xueyi Hao, Marina Litvak, Natalia Vanetik, Yanquan Zhou

TL;DR

The paper tackles the challenge of factual inconsistency in abstractive summarization by introducing FactCloze, a cloze-based factual error correction model, and SummDSC, a multi-dimensionally filtered, distilled faithful dataset. FactCloze uses autoregressive cloze modeling (with BART and T5) to fill in masked factual factors, enabling corrections that preserve context and causality, while a self-diagnosis and post-alert mechanism mitigate over-correction and risky corrections. SummDSC improves training data fidelity through multi-metric filtering (DAE, SummaC, ClozE) and creates an alert subset to handle high-risk cases, potentially improving generalization. Across FRANK benchmarks and BART-based summarization, the combination of FactCloze and SummDSC achieves superior factual consistency metrics, with human evaluation supporting improvements in correctness and a robust post-alert strategy. The work offers practical pathways for deploying faithfulness-aware post-editing in real-world summarization systems, supported by released code and models.

Abstract

Improving factual consistency in abstractive summarization has been a focus of current research. One promising approach is the post-editing method. However, previous works have yet to make sufficient use of factual factors in summaries and suffers from the negative effect of the training datasets. In this paper, we first propose a novel factual error correction model FactCloze based on a conditional-generation cloze task. FactCloze can construct the causality among factual factors while being able to determine whether the blank can be answered or not. Then, we propose a data distillation method to generate a more faithful summarization dataset SummDSC via multiple-dimensional evaluation. We experimentally validate the effectiveness of our approach, which leads to an improvement in multiple factual consistency metrics compared to baselines.

Improving Factual Error Correction for Abstractive Summarization via Data Distillation and Conditional-generation Cloze

TL;DR

The paper tackles the challenge of factual inconsistency in abstractive summarization by introducing FactCloze, a cloze-based factual error correction model, and SummDSC, a multi-dimensionally filtered, distilled faithful dataset. FactCloze uses autoregressive cloze modeling (with BART and T5) to fill in masked factual factors, enabling corrections that preserve context and causality, while a self-diagnosis and post-alert mechanism mitigate over-correction and risky corrections. SummDSC improves training data fidelity through multi-metric filtering (DAE, SummaC, ClozE) and creates an alert subset to handle high-risk cases, potentially improving generalization. Across FRANK benchmarks and BART-based summarization, the combination of FactCloze and SummDSC achieves superior factual consistency metrics, with human evaluation supporting improvements in correctness and a robust post-alert strategy. The work offers practical pathways for deploying faithfulness-aware post-editing in real-world summarization systems, supported by released code and models.

Abstract

Improving factual consistency in abstractive summarization has been a focus of current research. One promising approach is the post-editing method. However, previous works have yet to make sufficient use of factual factors in summaries and suffers from the negative effect of the training datasets. In this paper, we first propose a novel factual error correction model FactCloze based on a conditional-generation cloze task. FactCloze can construct the causality among factual factors while being able to determine whether the blank can be answered or not. Then, we propose a data distillation method to generate a more faithful summarization dataset SummDSC via multiple-dimensional evaluation. We experimentally validate the effectiveness of our approach, which leads to an improvement in multiple factual consistency metrics compared to baselines.
Paper Structure (42 sections, 4 equations, 4 figures, 10 tables)

This paper contains 42 sections, 4 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: Overview of FactCloze. A hypothesis sentence is passed to a self-diagnosis mechanism and a factual error correction module. An alert will be raised if the corrected sentence contains s.
  • Figure 2: Overview of SummDSC. We use four modules to convert a document-summary pair to $\text{SummDSC}_{base}$ and $\text{SummDSC}_{alert}$ formats. The black dashed line indicates that the factual consistency score is above the threshold, while the opposite is true for the gray ones.
  • Figure 3: A radar chart of the five factual consistency metrics on the FRANK dataset. The different directions indicate the average score on the samples with different error types. The description of each error is referred to pagnoni2021understanding. Specially, NE indicates a sample without factual errors.
  • Figure 4: A box plot for five factual consistency metrics. The samples are grouped into bins based on the percentiles of one metric score. The factual consistency score boxes of other metrics are plotted within each bin.