Table of Contents
Fetching ...

Quantifying Stereotypes in Language

Yang Liu

TL;DR

This work argues that stereotypes in language should be quantified along a continuous scale rather than as binary labels. It builds a dataset by combining SS and CP sources and annotates 2,976 sentences using Best-Worst-Scaling to produce stereotype scores in $[-1,1]$, learned via Iterative Luce Spectral Ranking. Pretrained language models are then trained to predict these scores, achieving strong correlations (e.g., RoBERTa $r=0.8124$; $MSE=0.0184$). The study links stereotype scores to hate speech, sexism, sentiment, and disadvantaged/advantaged groups, showing that stereotype intensity often tracks with harmful language and can boost downstream tasks like hate-speech detection. It discusses ethical considerations and limitations while highlighting the potential for more nuanced, fine-grained analyses of social biases in NLP.

Abstract

A stereotype is a generalized perception of a specific group of humans. It is often potentially encoded in human language, which is more common in texts on social issues. Previous works simply define a sentence as stereotypical and anti-stereotypical. However, the stereotype of a sentence may require fine-grained quantification. In this paper, to fill this gap, we quantify stereotypes in language by annotating a dataset. We use the pre-trained language models (PLMs) to learn this dataset to predict stereotypes of sentences. Then, we discuss stereotypes about common social issues such as hate speech, sexism, sentiments, and disadvantaged and advantaged groups. We demonstrate the connections and differences between stereotypes and common social issues, and all four studies validate the general findings of the current studies. In addition, our work suggests that fine-grained stereotype scores are a highly relevant and competitive dimension for research on social issues.

Quantifying Stereotypes in Language

TL;DR

This work argues that stereotypes in language should be quantified along a continuous scale rather than as binary labels. It builds a dataset by combining SS and CP sources and annotates 2,976 sentences using Best-Worst-Scaling to produce stereotype scores in , learned via Iterative Luce Spectral Ranking. Pretrained language models are then trained to predict these scores, achieving strong correlations (e.g., RoBERTa ; ). The study links stereotype scores to hate speech, sexism, sentiment, and disadvantaged/advantaged groups, showing that stereotype intensity often tracks with harmful language and can boost downstream tasks like hate-speech detection. It discusses ethical considerations and limitations while highlighting the potential for more nuanced, fine-grained analyses of social biases in NLP.

Abstract

A stereotype is a generalized perception of a specific group of humans. It is often potentially encoded in human language, which is more common in texts on social issues. Previous works simply define a sentence as stereotypical and anti-stereotypical. However, the stereotype of a sentence may require fine-grained quantification. In this paper, to fill this gap, we quantify stereotypes in language by annotating a dataset. We use the pre-trained language models (PLMs) to learn this dataset to predict stereotypes of sentences. Then, we discuss stereotypes about common social issues such as hate speech, sexism, sentiments, and disadvantaged and advantaged groups. We demonstrate the connections and differences between stereotypes and common social issues, and all four studies validate the general findings of the current studies. In addition, our work suggests that fine-grained stereotype scores are a highly relevant and competitive dimension for research on social issues.
Paper Structure (31 sections, 9 figures, 8 tables)

This paper contains 31 sections, 9 figures, 8 tables.

Figures (9)

  • Figure 1: An example of how our work is different from previous works.
  • Figure 2: The kernel density curves for the bias types profession, race, gender, and religion in the dataset. The vertical dashed line indicates the average of the stereotype scores of the samples in a given class.
  • Figure 3: The results of the experiments on BERT, DistilBERT, and RoBERTa demonstrate that hate speech has higher stereotypes than non-hate speech.
  • Figure 4: Stereotype scores for different target groups in hate speech.
  • Figure 5: Scatter plots of toxicity scores and stereotype scores for samples with and without sexism.
  • ...and 4 more figures