Semi-Supervised Dialogue Abstractive Summarization via High-Quality Pseudolabel Selection

Jianfeng He; Hang Su; Jason Cai; Igor Shalyminov; Hwanjun Song; Saab Mansour

Semi-Supervised Dialogue Abstractive Summarization via High-Quality Pseudolabel Selection

Jianfeng He, Hang Su, Jason Cai, Igor Shalyminov, Hwanjun Song, Saab Mansour

TL;DR

This work tackles semi-supervised dialogue abstractive summarization by addressing pseudolabel noise without relying on ground truth. It introduces SiCF, a three-component score (Semantic Invariance, Coverage, Faithfulness) to judge the quality of model-generated summaries and guide unlabeled-data selection. A variant-length multi-label Bayesian Neural Network is developed to improve uncertainty estimation for generation tasks. Across SAMSUM, DIALOGSUM, and TODSUM, SiCF-enabled selective training yields measurable gains in SSDS metrics and uncertainty estimation, demonstrating practical benefits for reducing labeling costs while sustaining performance.

Abstract

Semi-supervised dialogue summarization (SSDS) leverages model-generated summaries to reduce reliance on human-labeled data and improve the performance of summarization models. While addressing label noise, previous works on semi-supervised learning primarily focus on natural language understanding tasks, assuming each sample has a unique label. However, these methods are not directly applicable to SSDS, as it is a generative task, and each dialogue can be summarized in different ways. In this work, we propose a novel scoring approach, SiCF, which encapsulates three primary dimensions of summarization model quality: Semantic invariance (indicative of model confidence), Coverage (factual recall), and Faithfulness (factual precision). Using the SiCF score, we select unlabeled dialogues with high-quality generated summaries to train summarization models. Comprehensive experiments on three public datasets demonstrate the effectiveness of SiCF scores in uncertainty estimation and semi-supervised learning for dialogue summarization tasks. Our code is available at \url{https://github.com/amazon-science/summarization-sicf-score}.

Semi-Supervised Dialogue Abstractive Summarization via High-Quality Pseudolabel Selection

TL;DR

Abstract

Paper Structure (41 sections, 11 equations, 10 figures, 22 tables)

This paper contains 41 sections, 11 equations, 10 figures, 22 tables.

Introduction
Related Work
Problem Setting
Our Model
Overview of SSDS via our SiCF Score
SiCF Score: Semantic Invariance
SiCF Score: Coverage
SiCF Score: Faithfulnesss
Mean and Bayesian Neural Network
Fine-Tune on Unlabeled Dialogues Selected by SiCF Scores
Experiments
Task setting
Data
Baselines
Metrics
...and 26 more sections

Figures (10)

Figure 1: A global view of our SSDS framework using the semantic invariance, coverage, and faithfulness combined score (SiCF). Each row in the colored matrix represents diverse predicted summaries for a dialogue. For each unlabeled dialogue, the predicted summary closest to mean embedding is chosen. We then rank the chosen predicted summaries by the SiCF scores and select a portion of them. The selected <unlabeled dialogues, pseudolabels> and all human-labeled pairs are used for our target model learning. The detailed SSDS framework is outlined in Sec. \ref{['sec:overview_ssds']}.
Figure 2: The global view of our coverage and faithfulness scores in our SiCF score.
Figure 3: The diagram of the variant-length multi-label BNN. It uses a $\tilde{V}$ column as an example to obtain an entropy value. This example sets $k=3$. The $\lambda_{cov/fai}$ is the sum of the entropy values from all $\tilde{V}$ columns.
Figure 4: Diagram of uncertainty estimation results in force true ratio of 0%, 10%, 20% ..., 90% on SAMSUM 1:50 setting.
Figure 5: Diagram of uncertainty estimation results in force true ratio of 0%, 10%, 20% ..., 90% on SAMSUM 5:50 setting.
...and 5 more figures

Semi-Supervised Dialogue Abstractive Summarization via High-Quality Pseudolabel Selection

TL;DR

Abstract

Semi-Supervised Dialogue Abstractive Summarization via High-Quality Pseudolabel Selection

Authors

TL;DR

Abstract

Table of Contents

Figures (10)