Fair Abstractive Summarization of Diverse Perspectives

Yusen Zhang; Nan Zhang; Yixin Liu; Alexander Fabbri; Junru Liu; Ryo Kamoi; Xiaoxin Lu; Caiming Xiong; Jieyu Zhao; Dragomir Radev; Kathleen McKeown; Rui Zhang

Fair Abstractive Summarization of Diverse Perspectives

Yusen Zhang, Nan Zhang, Yixin Liu, Alexander Fabbri, Junru Liu, Ryo Kamoi, Xiaoxin Lu, Caiming Xiong, Jieyu Zhao, Dragomir Radev, Kathleen McKeown, Rui Zhang

TL;DR

This paper formally defines fairness in abstractive summarization as not underrepresenting perspectives of any groups of people, and proposes four reference-free automatic metrics by measuring the differences between target and source perspectives.

Abstract

People from different social and demographic groups express diverse perspectives and conflicting opinions on a broad set of topics such as product reviews, healthcare, law, and politics. A fair summary should provide a comprehensive coverage of diverse perspectives without underrepresenting certain groups. However, current work in summarization metrics and Large Language Models (LLMs) evaluation has not explored fair abstractive summarization. In this paper, we systematically investigate fair abstractive summarization for user-generated data. We first formally define fairness in abstractive summarization as not underrepresenting perspectives of any groups of people, and we propose four reference-free automatic metrics by measuring the differences between target and source perspectives. We evaluate nine LLMs, including three GPT models, four LLaMA models, PaLM 2, and Claude, on six datasets collected from social media, online reviews, and recorded transcripts. Experiments show that both the model-generated and the human-written reference summaries suffer from low fairness. We conduct a comprehensive analysis of the common factors influencing fairness and propose three simple but effective methods to alleviate unfair summarization. Our dataset and code are available at https://github.com/psunlpgroup/FairSumm.

Fair Abstractive Summarization of Diverse Perspectives

TL;DR

Abstract

Paper Structure (64 sections, 4 equations, 10 figures, 15 tables)

This paper contains 64 sections, 4 equations, 10 figures, 15 tables.

Introduction
Summarization of Diverse Perspectives with Social Attributes
Task Formulation
Social Attributes.
Definition 2.1 Summarization with Social Attribute.
Definition of Fairness in Summarization
Defining Value Distribution
Definition 3.1 Value Distribution.
Calculating Value Distribution
N-gram Matching.
Neural Matching.
Defining Summarization Fairness
Definition 3.2 Summarization Fairness.
PerspectiveSumm
Evaluating Fairness in Summarization
...and 49 more sections

Figures (10)

Figure 1: An example from PerspectiveSumm. The blue/red box displays the input consisting of positive/negative reviews. The grey box shows the summary generated by GPT-3.5 (text-davinci-003). The generated summary is unfair because the negative reviews are underrepresented compared with the positive reviews.
Figure 2: Overview of our proposed metrics. Dist. means value distribution.
Figure 3: Relation between temperature and correlation scores on Claritin using gpt-turbo-3.5. X-axis is the softmax temperature of BARTScore. Y-axis is the Krippendorff’s alpha and Pearson correlation coefficient with human evaluation. Pearson correlation coefficient is higher than Krippendorff's alpha because Pearson correlation coefficient only computes positive relations while Krippendorff's alpha requires the annotations to be the same.
Figure 4: Distribution of Male and Female values in summaries generated by gpt-turbo-3.5 on Claritin.
Figure 5: Effect of decoding temperature.
...and 5 more figures

Fair Abstractive Summarization of Diverse Perspectives

TL;DR

Abstract

Fair Abstractive Summarization of Diverse Perspectives

Authors

TL;DR

Abstract

Table of Contents

Figures (10)