Table of Contents
Fetching ...

StereoDetect: Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological Underpinnings

Kaustubh Shivshankar Shejole, Pushpak Bhattacharyya

TL;DR

This work tackles the detection of stereotypes and anti-stereotypes as distinct from biases in NLP. It introduces a five-tuple representation and a social-psychology grounded framework, then builds StereoDetect—a curated, dual-format benchmark spanning five domains with neutral and bias instances. Empirical results show sub-10B LLMs and GPT-4o often misclassify anti-stereotypes, while a StereoDetect-fine-tuned Gemma-2-9B model achieves strong performance and better generalization than baselines on StereoDetect. The dataset and code release underpin progress toward reliable, definition-aligned stereotype and anti-stereotype detection for responsible AI.

Abstract

Stereotypes are known to have very harmful effects, making their detection critically important. However, current research predominantly focuses on detecting and evaluating stereotypical biases, thereby leaving the study of stereotypes in its early stages. Our study revealed that many works have failed to clearly distinguish between stereotypes and stereotypical biases, which has significantly slowed progress in advancing research in this area. Stereotype and Anti-stereotype detection is a problem that requires social knowledge; hence, it is one of the most difficult areas in Responsible AI. This work investigates this task, where we propose a five-tuple definition and provide precise terminologies disentangling stereotypes, anti-stereotypes, stereotypical bias, and general bias. We provide a conceptual framework grounded in social psychology for reliable detection. We identify key shortcomings in existing benchmarks for this task of stereotype and anti-stereotype detection. To address these gaps, we developed StereoDetect, a well curated, definition-aligned benchmark dataset designed for this task. We show that sub-10B language models and GPT-4o frequently misclassify anti-stereotypes and fail to recognize neutral overgeneralizations. We demonstrate StereoDetect's effectiveness through multiple qualitative and quantitative comparisons with existing benchmarks and models fine-tuned on them. The dataset and code is available at https://github.com/KaustubhShejole/StereoDetect.

StereoDetect: Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological Underpinnings

TL;DR

This work tackles the detection of stereotypes and anti-stereotypes as distinct from biases in NLP. It introduces a five-tuple representation and a social-psychology grounded framework, then builds StereoDetect—a curated, dual-format benchmark spanning five domains with neutral and bias instances. Empirical results show sub-10B LLMs and GPT-4o often misclassify anti-stereotypes, while a StereoDetect-fine-tuned Gemma-2-9B model achieves strong performance and better generalization than baselines on StereoDetect. The dataset and code release underpin progress toward reliable, definition-aligned stereotype and anti-stereotype detection for responsible AI.

Abstract

Stereotypes are known to have very harmful effects, making their detection critically important. However, current research predominantly focuses on detecting and evaluating stereotypical biases, thereby leaving the study of stereotypes in its early stages. Our study revealed that many works have failed to clearly distinguish between stereotypes and stereotypical biases, which has significantly slowed progress in advancing research in this area. Stereotype and Anti-stereotype detection is a problem that requires social knowledge; hence, it is one of the most difficult areas in Responsible AI. This work investigates this task, where we propose a five-tuple definition and provide precise terminologies disentangling stereotypes, anti-stereotypes, stereotypical bias, and general bias. We provide a conceptual framework grounded in social psychology for reliable detection. We identify key shortcomings in existing benchmarks for this task of stereotype and anti-stereotype detection. To address these gaps, we developed StereoDetect, a well curated, definition-aligned benchmark dataset designed for this task. We show that sub-10B language models and GPT-4o frequently misclassify anti-stereotypes and fail to recognize neutral overgeneralizations. We demonstrate StereoDetect's effectiveness through multiple qualitative and quantitative comparisons with existing benchmarks and models fine-tuned on them. The dataset and code is available at https://github.com/KaustubhShejole/StereoDetect.

Paper Structure

This paper contains 57 sections, 17 figures, 21 tables.

Figures (17)

  • Figure 1: Conceptual framework for stereotype and anti-stereotype detection task grounded in principles of social psychology for reliable detection.
  • Figure 2: Pipeline for constructing the StereoDetect dataset: manual sentence curation from StereoSet; transformation into stereotypes and anti‑stereotypes; including stereotypes for LGBTQ+ from WinoQueer; inclusion of neutral w/ target groups from Wikipedia; GPT‑4o–assisted generation of LGBTQ+ anti‑stereotypes and neutral counterfactuals; inclusion of bias and neutral w/o target group instances from StereoSet; and multi‑stage human validation.
  • Figure 3: Prompt used for generating LGBTQ+ anti-Stereotypes from stereotypes through inverting the stereotypes.
  • Figure 4: Prompt used for generating neutral false statements from facts derived from Wikipedia about target groups.
  • Figure 5: Prompt used for zero-shot inference.
  • ...and 12 more figures