Table of Contents
Fetching ...

From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models

Shangbin Feng, Chan Young Park, Yuhan Liu, Yulia Tsvetkov

TL;DR

The paper addresses how political biases embedded in pretraining data propagate through language models to affect fairness in hate speech and misinformation detection. It proposes a two-step framework using political spectrum theory and partisan corpora to quantify LM leanings and then test downstream task fairness under controlled conditions with multiple architectures. Key findings show that LMs adopt distinct political leanings influenced by pretraining data, and that downstream performance and fairness vary by identity groups and misinformation sources; a partisan ensemble can improve overall performance. The work highlights that non-toxic, diverse data can still encode social biases and discusses mitigation strategies including ensemble and strategic pretraining, with cautions about censorship and misuse.

Abstract

Language models (LMs) are pretrained on diverse data sources, including news, discussion forums, books, and online encyclopedias. A significant portion of this data includes opinions and perspectives which, on one hand, celebrate democracy and diversity of ideas, and on the other hand are inherently socially biased. Our work develops new methods to (1) measure political biases in LMs trained on such corpora, along social and economic axes, and (2) measure the fairness of downstream NLP models trained on top of politically biased LMs. We focus on hate speech and misinformation detection, aiming to empirically quantify the effects of political (social, economic) biases in pretraining data on the fairness of high-stakes social-oriented tasks. Our findings reveal that pretrained LMs do have political leanings that reinforce the polarization present in pretraining corpora, propagating social biases into hate speech predictions and misinformation detectors. We discuss the implications of our findings for NLP research and propose future directions to mitigate unfairness.

From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models

TL;DR

The paper addresses how political biases embedded in pretraining data propagate through language models to affect fairness in hate speech and misinformation detection. It proposes a two-step framework using political spectrum theory and partisan corpora to quantify LM leanings and then test downstream task fairness under controlled conditions with multiple architectures. Key findings show that LMs adopt distinct political leanings influenced by pretraining data, and that downstream performance and fairness vary by identity groups and misinformation sources; a partisan ensemble can improve overall performance. The work highlights that non-toxic, diverse data can still encode social biases and discusses mitigation strategies including ensemble and strategic pretraining, with cautions about censorship and misuse.

Abstract

Language models (LMs) are pretrained on diverse data sources, including news, discussion forums, books, and online encyclopedias. A significant portion of this data includes opinions and perspectives which, on one hand, celebrate democracy and diversity of ideas, and on the other hand are inherently socially biased. Our work develops new methods to (1) measure political biases in LMs trained on such corpora, along social and economic axes, and (2) measure the fairness of downstream NLP models trained on top of politically biased LMs. We focus on hate speech and misinformation detection, aiming to empirically quantify the effects of political (social, economic) biases in pretraining data on the fairness of high-stakes social-oriented tasks. Our findings reveal that pretrained LMs do have political leanings that reinforce the polarization present in pretraining corpora, propagating social biases into hate speech predictions and misinformation detectors. We discuss the implications of our findings for NLP research and propose future directions to mitigate unfairness.
Paper Structure (41 sections, 6 figures, 15 tables)

This paper contains 41 sections, 6 figures, 15 tables.

Figures (6)

  • Figure 1: Measuring the political leaning of various pretrained LMs. BERT and its variants are more socially conservative compared to the GPT series. Node color denotes different model families.
  • Figure 2: Change in RoBERTa political leaning from pretraining on pre-Trump corpora (start of the arrow) to post-Trump corpora (end of the arrow). Notably, the majority of setups move towards increased polarization (further away from the center) after pretraining on post-Trump corpora. Thus illustrates that pretrained language models could pick up the heightened polarization in news and social media due to socio-political events.
  • Figure 3: Pretraining LMs with the six partisan corpora and re-evaluate their position on the political spectrum.
  • Figure 4: The trajectory of LM political leaning with increasing pretraining corpus size and epochs.
  • Figure 5: The stability of LMs' response to political propositions with regard to changes in statement paraphrasing.
  • ...and 1 more figures