Table of Contents
Fetching ...

Probing Social Identity Bias in Chinese LLMs with Gendered Pronouns and Social Groups

Geng Liu, Feng Li, Junjie Mu, Mengxiao Zhu, Francesco Pierri

TL;DR

This study systematically probes social identity biases in Chinese language prompts and interactions by examining ingroup ('We') versus outgroup ('They') framings across ten LLMs and 240 Chinese social groups, complemented by analysis of WildChat conversation data. It introduces a Mandarin-specific evaluation framework employing gendered pronouns (他们/她们) and a mix of controlled prompts and naturalistic dialogues, analyzed with sentiment labeling and logistic regression to quantify ingroup solidarity and outgroup hostility. Results show consistent ingroup positivity and outgroup hostility across model types, with stronger effects in pretrained models and notable gender asymmetries—female outgroups often provoking stronger negativity—though instruction-tuned models tend to be more balanced. In naturalistic dialogue, biases intensify, especially in assistant responses, highlighting risks for deployed, user-facing Chinese NLP systems and signaling a need for culturally aware assessment and mitigation strategies tailored to Chinese sociolinguistic contexts.

Abstract

Large language models (LLMs) are increasingly deployed in user-facing applications, raising concerns about their potential to reflect and amplify social biases. We investigate social identity framing in Chinese LLMs using Mandarin-specific prompts across ten representative Chinese LLMs, evaluating responses to ingroup ("We") and outgroup ("They") framings, and extending the setting to 240 social groups salient in the Chinese context. To complement controlled experiments, we further analyze Chinese-language conversations from a corpus of real interactions between users and chatbots. Across models, we observe systematic ingroup-positive and outgroup-negative tendencies, which are not confined to synthetic prompts but also appear in naturalistic dialogue, indicating that bias dynamics might strengthen in real interactions. Our study provides a language-aware evaluation framework for Chinese LLMs, demonstrating that social identity biases documented in English generalize cross-linguistically and intensify in user-facing contexts.

Probing Social Identity Bias in Chinese LLMs with Gendered Pronouns and Social Groups

TL;DR

This study systematically probes social identity biases in Chinese language prompts and interactions by examining ingroup ('We') versus outgroup ('They') framings across ten LLMs and 240 Chinese social groups, complemented by analysis of WildChat conversation data. It introduces a Mandarin-specific evaluation framework employing gendered pronouns (他们/她们) and a mix of controlled prompts and naturalistic dialogues, analyzed with sentiment labeling and logistic regression to quantify ingroup solidarity and outgroup hostility. Results show consistent ingroup positivity and outgroup hostility across model types, with stronger effects in pretrained models and notable gender asymmetries—female outgroups often provoking stronger negativity—though instruction-tuned models tend to be more balanced. In naturalistic dialogue, biases intensify, especially in assistant responses, highlighting risks for deployed, user-facing Chinese NLP systems and signaling a need for culturally aware assessment and mitigation strategies tailored to Chinese sociolinguistic contexts.

Abstract

Large language models (LLMs) are increasingly deployed in user-facing applications, raising concerns about their potential to reflect and amplify social biases. We investigate social identity framing in Chinese LLMs using Mandarin-specific prompts across ten representative Chinese LLMs, evaluating responses to ingroup ("We") and outgroup ("They") framings, and extending the setting to 240 social groups salient in the Chinese context. To complement controlled experiments, we further analyze Chinese-language conversations from a corpus of real interactions between users and chatbots. Across models, we observe systematic ingroup-positive and outgroup-negative tendencies, which are not confined to synthetic prompts but also appear in naturalistic dialogue, indicating that bias dynamics might strengthen in real interactions. Our study provides a language-aware evaluation framework for Chinese LLMs, demonstrating that social identity biases documented in English generalize cross-linguistically and intensify in user-facing contexts.

Paper Structure

This paper contains 23 sections, 2 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Odds ratios for ingroup solidarity (blue) and outgroup hostility (orange) across Chinese-based LLMs. Values greater than 1 indicate a higher likelihood of positive sentiment toward ingroups or negative sentiment toward outgroups, respectively. Error bars represent 95% confidence intervals. Bold font indicates instruction-tuned models.
  • Figure 2: Odds ratios of ingroup solidarity and outgroup hostility for comparisons between "We" (ingroup) and "They" (male outgroup, top), and between "We" (ingroup) and "They" (female outgroup, bottom). Error bars represent 95% confidence intervals. Bold font indicates instruction-tuned models.
  • Figure 3: Odds ratios for negative sentiment toward female outgroups relative to male outgroups across different LLMs. Bold font indicates instruction-tuned models.
  • Figure 4: Odds ratios for ingroup solidarity (blue) and outgroup hostility (orange) across Chinese social groups for Qwen3-8B. Values greater than 1 indicate a higher likelihood of positive sentiment toward ingroups or negative sentiment toward outgroups, respectively. Error bars represent 95% confidence intervals.
  • Figure 5: Odds ratios for ingroup solidarity and outgroup hostility in naturalistic dialogue by source type (user and assistant).
  • ...and 1 more figures