Table of Contents
Fetching ...

Understanding Position Bias Effects on Fairness in Social Multi-Document Summarization

Olubusayo Olabisi, Ameeta Agrawal

TL;DR

The paper addresses fairness in social multi-document summarization by analyzing how input document order across dialect groups influences representation in generated summaries. It adopts a cross-dialect DivSumm setup, evaluating seven abstractive and three extractive models under shuffled and ordered input conditions, and measures fairness with a semantic-similarity-based gap $\ abla$Fair alongside four textual-quality metrics. Key findings show no position bias in human references or shuffled system outputs, but ordered inputs induce substantial bias favoring the first group, with $\ abla$Fair reaching notable levels (e.g., up to 0.14 in some cases); textual quality remains largely unaffected across conditions. The work highlights the importance of randomizing input order to achieve fair and effective summaries in dialect-diverse social data, with practical implications for dataset design, model evaluation, and deployment in real-world settings.

Abstract

Text summarization models have typically focused on optimizing aspects of quality such as fluency, relevance, and coherence, particularly in the context of news articles. However, summarization models are increasingly being used to summarize diverse sources of text, such as social media data, that encompass a wide demographic user base. It is thus crucial to assess not only the quality of the generated summaries, but also the extent to which they can fairly represent the opinions of diverse social groups. Position bias, a long-known issue in news summarization, has received limited attention in the context of social multi-document summarization. We deeply investigate this phenomenon by analyzing the effect of group ordering in input documents when summarizing tweets from three distinct linguistic communities: African-American English, Hispanic-aligned Language, and White-aligned Language. Our empirical analysis shows that although the textual quality of the summaries remains consistent regardless of the input document order, in terms of fairness, the results vary significantly depending on how the dialect groups are presented in the input data. Our results suggest that position bias manifests differently in social multi-document summarization, severely impacting the fairness of summarization models.

Understanding Position Bias Effects on Fairness in Social Multi-Document Summarization

TL;DR

The paper addresses fairness in social multi-document summarization by analyzing how input document order across dialect groups influences representation in generated summaries. It adopts a cross-dialect DivSumm setup, evaluating seven abstractive and three extractive models under shuffled and ordered input conditions, and measures fairness with a semantic-similarity-based gap Fair alongside four textual-quality metrics. Key findings show no position bias in human references or shuffled system outputs, but ordered inputs induce substantial bias favoring the first group, with Fair reaching notable levels (e.g., up to 0.14 in some cases); textual quality remains largely unaffected across conditions. The work highlights the importance of randomizing input order to achieve fair and effective summaries in dialect-diverse social data, with practical implications for dataset design, model evaluation, and deployment in real-world settings.

Abstract

Text summarization models have typically focused on optimizing aspects of quality such as fluency, relevance, and coherence, particularly in the context of news articles. However, summarization models are increasingly being used to summarize diverse sources of text, such as social media data, that encompass a wide demographic user base. It is thus crucial to assess not only the quality of the generated summaries, but also the extent to which they can fairly represent the opinions of diverse social groups. Position bias, a long-known issue in news summarization, has received limited attention in the context of social multi-document summarization. We deeply investigate this phenomenon by analyzing the effect of group ordering in input documents when summarizing tweets from three distinct linguistic communities: African-American English, Hispanic-aligned Language, and White-aligned Language. Our empirical analysis shows that although the textual quality of the summaries remains consistent regardless of the input document order, in terms of fairness, the results vary significantly depending on how the dialect groups are presented in the input data. Our results suggest that position bias manifests differently in social multi-document summarization, severely impacting the fairness of summarization models.
Paper Structure (20 sections, 1 equation, 7 figures, 6 tables)

This paper contains 20 sections, 1 equation, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Illustration showing shuffled vs. ordered input for multi-document summarization consisting of documents from three diverse groups ($\mathcal{D}^a$, $\mathcal{D}^h$, $\mathcal{D}^w$) as indicated by the three colors. The ordered input is denoted as $\mathcal{O}^a$ when $\mathcal{D}^a$ documents appear first in the input.
  • Figure 2: Average token overlap between human-written reference summaries and each document $d_i$ using the DivSumm dataset. Text position on the $x$-axis has been normalized between 0 and 1.
  • Figure 3: Average token overlap between ordered system-generated summaries by each abstractive summarization model and each document $d_i$ in the input set $\mathcal{D}$ of DivSumm. Text position on the x-axis has been normalized between 0 and 1.
  • Figure 4: Average token overlap between ordered system-generated summaries by each of the seven abstractive summarization models and each document $d_i$ in the input set $\mathcal{D}$ of the DivSumm dataset. Text position on the $x$-axis has been normalized between 0 and 1.
  • Figure 5: Density distribution of similarity scores between system-generated summaries and each group, across all summarization models for $\mathcal{O}^{w}$, $\mathcal{O}^{a}$, $\mathcal{O}^{h}$ and shuffled input sets. The outputs of shuffled inputs show very different and balanced distributions compared to the ordered inputs.
  • ...and 2 more figures