Cross-Language Bias Examination in Large Language Models
Yuxuan Liang, Marwa Mahmoud
TL;DR
The paper introduces a reproducible cross-language bias evaluation framework by combining the BBQ explicit-bias benchmark with a prompt-based IAT across English, Chinese, Arabic, French, and Spanish. It provides empirical evidence of substantial cross-language bias gaps and distinct patterns across bias types, notably high implicit bias for age and higher explicit bias in gender and nationality for certain languages. The study highlights the limitations of English-centric bias research and the necessity of multilingual fairness practices, suggesting data balancing, advanced prompting, and optimization approaches as mitigation avenues. Overall, the work lays a foundation for equitable multilingual LLMs by revealing language-dependent bias dynamics and offering a standardized evaluation workflow.
Abstract
This study introduces an innovative multilingual bias evaluation framework for assessing bias in Large Language Models, combining explicit bias assessment through the BBQ benchmark with implicit bias measurement using a prompt-based Implicit Association Test. By translating the prompts and word list into five target languages, English, Chinese, Arabic, French, and Spanish, we directly compare different types of bias across languages. The results reveal substantial gaps in bias across languages used in LLMs. For example, Arabic and Spanish consistently show higher levels of stereotype bias, while Chinese and English exhibit lower levels of bias. We also identify contrasting patterns across bias types. Age shows the lowest explicit bias but the highest implicit bias, emphasizing the importance of detecting implicit biases that are undetectable with standard benchmarks. These findings indicate that LLMs vary significantly across languages and bias dimensions. This study fills a key research gap by providing a comprehensive methodology for cross-lingual bias analysis. Ultimately, our work establishes a foundation for the development of equitable multilingual LLMs, ensuring fairness and effectiveness across diverse languages and cultures.
