Table of Contents
Fetching ...

Persona Setting Pitfall: Persistent Outgroup Biases in Large Language Models Arising from Social Identity Adoption

Wenchao Dong, Assem Zhunis, Dongyoung Jeong, Hyojin Chin, Jiyoung Han, Meeyoung Cha

TL;DR

This experiment successfully mitigated the inherent pro-liberal, anti-conservative bias in LLMs by guiding them to adopt the perspectives of the initially disfavored group, and these results were replicated in the context of gender bias.

Abstract

Drawing parallels between human cognition and artificial intelligence, we explored how large language models (LLMs) internalize identities imposed by targeted prompts. Informed by Social Identity Theory, these identity assignments lead LLMs to distinguish between "we" (the ingroup) and "they" (the outgroup). This self-categorization generates both ingroup favoritism and outgroup bias. Nonetheless, existing literature has predominantly focused on ingroup favoritism, often overlooking outgroup bias, which is a fundamental source of intergroup prejudice and discrimination. Our experiment addresses this gap by demonstrating that outgroup bias manifests as strongly as ingroup favoritism. Furthermore, we successfully mitigated the inherent pro-liberal, anti-conservative bias in LLMs by guiding them to adopt the perspectives of the initially disfavored group. These results were replicated in the context of gender bias. Our findings highlight the potential to develop more equitable and balanced language models.

Persona Setting Pitfall: Persistent Outgroup Biases in Large Language Models Arising from Social Identity Adoption

TL;DR

This experiment successfully mitigated the inherent pro-liberal, anti-conservative bias in LLMs by guiding them to adopt the perspectives of the initially disfavored group, and these results were replicated in the context of gender bias.

Abstract

Drawing parallels between human cognition and artificial intelligence, we explored how large language models (LLMs) internalize identities imposed by targeted prompts. Informed by Social Identity Theory, these identity assignments lead LLMs to distinguish between "we" (the ingroup) and "they" (the outgroup). This self-categorization generates both ingroup favoritism and outgroup bias. Nonetheless, existing literature has predominantly focused on ingroup favoritism, often overlooking outgroup bias, which is a fundamental source of intergroup prejudice and discrimination. Our experiment addresses this gap by demonstrating that outgroup bias manifests as strongly as ingroup favoritism. Furthermore, we successfully mitigated the inherent pro-liberal, anti-conservative bias in LLMs by guiding them to adopt the perspectives of the initially disfavored group. These results were replicated in the context of gender bias. Our findings highlight the potential to develop more equitable and balanced language models.
Paper Structure (50 sections, 4 equations, 5 figures, 13 tables)

This paper contains 50 sections, 4 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: LLMs exhibit ingroup bias by aligning their values with the social identities present in prompts, while displaying outgroup bias by rejecting values associated with outgroup identities. Assigning a favored group identity exacerbates the pre-existing bias, whereas configuring a disfavored group identity can significantly mitigate it.
  • Figure 2: Political biases without assigning any identity, and with assigning human and independent identities (A). Political alignment changes after setting the Republican identity (B) and the Democrat identity (C). Dashed arrows represent ingroup biases, while solid arrows denote outgroup biases. Numbers refer to the magnitudes of intergroup bias elicited by the group identities. Circles and vertical lines represent mean values for response distributions.
  • Figure 3: Gender biases without assigning any identity, and with assigning human and non-binary identities (A). Gender bias changes after setting the man identity (B) and the woman identity (C). Dashed arrows represent ingroup biases, while solid arrows denote outgroup biases. Numbers refer to the magnitudes of intergroup bias elicited by the group identities. Circles and vertical lines represent mean values for response distributions.
  • Figure 4: Debiasing effects comparisons are shown as distribution changes in (A) political and (B) gender contexts.
  • Figure 5: Political identity influences on gender bias. Circles and vertical lines represent mean values for response distributions.