Table of Contents
Fetching ...

Queer People are People First: Deconstructing Sexual Identity Stereotypes in Large Language Models

Harnoor Dhingra, Preetiha Jayashanker, Sayali Moghe, Emma Strubell

TL;DR

This work investigates representational bias against queer identities in large language models by prompting gender-neutral biographies with sexual-identity triggers and measuring output regard. It demonstrates measurable bias (lower regard for queer prompts) and introduces a post-hoc debiasing approach that uses SHAP analysis to identify low-regard terms and chain-of-thought prompting to rewrite text as a text-to-text style transfer, increasing regard while preserving context. The methodology combines bias-detection pipelines (word clouds, PMI, t-SNE, cosine similarity, and regard metrics) with a SHAP-guided debiasing mechanism, offering a practical path toward more affirming LLM outputs. The findings highlight the potential for debiasing to reduce representational harm in generated text, though they acknowledge limitations related to identity scope, data biases, and the need for broader non-heteronormative training data for future work.

Abstract

Large Language Models (LLMs) are trained primarily on minimally processed web text, which exhibits the same wide range of social biases held by the humans who created that content. Consequently, text generated by LLMs can inadvertently perpetuate stereotypes towards marginalized groups, like the LGBTQIA+ community. In this paper, we perform a comparative study of how LLMs generate text describing people with different sexual identities. Analyzing bias in the text generated by an LLM using regard score shows measurable bias against queer people. We then show that a post-hoc method based on chain-of-thought prompting using SHAP analysis can increase the regard of the sentence, representing a promising approach towards debiasing the output of LLMs in this setting.

Queer People are People First: Deconstructing Sexual Identity Stereotypes in Large Language Models

TL;DR

This work investigates representational bias against queer identities in large language models by prompting gender-neutral biographies with sexual-identity triggers and measuring output regard. It demonstrates measurable bias (lower regard for queer prompts) and introduces a post-hoc debiasing approach that uses SHAP analysis to identify low-regard terms and chain-of-thought prompting to rewrite text as a text-to-text style transfer, increasing regard while preserving context. The methodology combines bias-detection pipelines (word clouds, PMI, t-SNE, cosine similarity, and regard metrics) with a SHAP-guided debiasing mechanism, offering a practical path toward more affirming LLM outputs. The findings highlight the potential for debiasing to reduce representational harm in generated text, though they acknowledge limitations related to identity scope, data biases, and the need for broader non-heteronormative training data for future work.

Abstract

Large Language Models (LLMs) are trained primarily on minimally processed web text, which exhibits the same wide range of social biases held by the humans who created that content. Consequently, text generated by LLMs can inadvertently perpetuate stereotypes towards marginalized groups, like the LGBTQIA+ community. In this paper, we perform a comparative study of how LLMs generate text describing people with different sexual identities. Analyzing bias in the text generated by an LLM using regard score shows measurable bias against queer people. We then show that a post-hoc method based on chain-of-thought prompting using SHAP analysis can increase the regard of the sentence, representing a promising approach towards debiasing the output of LLMs in this setting.
Paper Structure (24 sections, 10 figures, 4 tables)

This paper contains 24 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: An illustrative example for generating a gender-neutral prompt. The biographical information about Hussain Dawood is sourced from the WikiBio dataset then made gender-neutral and anonymized. We then prepend this text with trigger words indicating sexual identity of the subject.
  • Figure 2: Proposed pipeline for bias detection in LLM. The biographies from WikiBio dataset are made gender-neutral. We then prepend these with trigger words indicating sexual identity of the subject of the biography. We conduct fairness analysis on the output generated by the LLM during text-to-text generation task using gender-neutral prompts.
  • Figure 3: Proposed pipeline for debiasing the output of the LLM. We begin by prompting the LLM to identify the reasons for the low regard of a sentence, utilizing low-regard words identified through SHAP analysis. Using the original sentence and the reason generated by the LLM, we then prompt the LLM again to generate a high regard sentence by replacing the low-regard words.
  • Figure 4: SHAP word analysis for positive regard sentence. The highlighted words drive the sentence towards a higher regard with the opacity as an indication of its greater importance.
  • Figure 5: SHAP word analysis for negative regard sentence. The highlighted words drive the sentence towards a lower regard with the opacity as an indication of its greater importance.
  • ...and 5 more figures