Queer People are People First: Deconstructing Sexual Identity Stereotypes in Large Language Models
Harnoor Dhingra, Preetiha Jayashanker, Sayali Moghe, Emma Strubell
TL;DR
This work investigates representational bias against queer identities in large language models by prompting gender-neutral biographies with sexual-identity triggers and measuring output regard. It demonstrates measurable bias (lower regard for queer prompts) and introduces a post-hoc debiasing approach that uses SHAP analysis to identify low-regard terms and chain-of-thought prompting to rewrite text as a text-to-text style transfer, increasing regard while preserving context. The methodology combines bias-detection pipelines (word clouds, PMI, t-SNE, cosine similarity, and regard metrics) with a SHAP-guided debiasing mechanism, offering a practical path toward more affirming LLM outputs. The findings highlight the potential for debiasing to reduce representational harm in generated text, though they acknowledge limitations related to identity scope, data biases, and the need for broader non-heteronormative training data for future work.
Abstract
Large Language Models (LLMs) are trained primarily on minimally processed web text, which exhibits the same wide range of social biases held by the humans who created that content. Consequently, text generated by LLMs can inadvertently perpetuate stereotypes towards marginalized groups, like the LGBTQIA+ community. In this paper, we perform a comparative study of how LLMs generate text describing people with different sexual identities. Analyzing bias in the text generated by an LLM using regard score shows measurable bias against queer people. We then show that a post-hoc method based on chain-of-thought prompting using SHAP analysis can increase the regard of the sentence, representing a promising approach towards debiasing the output of LLMs in this setting.
