Table of Contents
Fetching ...

Evaluating and Mitigating Linguistic Discrimination in Large Language Models

Guoliang Dong, Haoyu Wang, Jun Sun, Xinyu Wang

TL;DR

This work investigates linguistic discrimination in large language models by evaluating safety and output quality across 74 languages using AdvBench and NQ on four models, revealing substantial cross-language disparities especially for low-resource languages. It introduces LDFighter, a lightweight, plug-and-play mitigation that translates queries into multiple languages, runs them through the target model, and uses similarity-based voting on English translations to select a final answer, thereby improving cross-language consistency. Empirical results show significant reductions in multilingual jailbreak rates ($MJR$) and improved $F_1$ scores across models, with runtime costs that scale with the number of languages but remain practical. The approach demonstrates strong potential for delivering fairer, more reliable multilingual AI services without modifying the underlying LLMs.

Abstract

By training on text in various languages, large language models (LLMs) typically possess multilingual support and demonstrate remarkable capabilities in solving tasks described in different languages. However, LLMs can exhibit linguistic discrimination due to the uneven distribution of training data across languages. That is, LLMs are hard to keep the consistency of responses when faced with the same task but depicted in different languages. In this study, we first explore the consistency in the LLMs' outputs responding to queries in various languages from two aspects: safety and quality. We conduct this analysis with two datasets (AdvBench and NQ) based on four LLMs (Llama2-13b, Gemma-7b, GPT-3.5-turbo and Gemini-pro). The results show that LLMs exhibit stronger human alignment capabilities with queries in English, French, Russian, and Spanish (only 1.04\% of harmful queries successfully jailbreak on average) compared to queries in Bengali, Georgian, Nepali and Maithili (27.7\% of harmful queries jailbreak successfully on average). Moreover, for queries in English, Danish, Czech and Slovenian, LLMs tend to produce responses with a higher quality (with 0.1494 $F_1$ score on average) compared to the other languages. Upon these findings, we propose LDFighter, a similarity-based voting, to mitigate the linguistic discrimination in LLMs. LDFighter ensures consistent service for different language speakers. We evaluate LDFighter with both benign queries and harmful queries. The results show that LDFighter not only significantly reduces the jailbreak success rate but also improve the response quality on average, demonstrating its effectiveness.

Evaluating and Mitigating Linguistic Discrimination in Large Language Models

TL;DR

This work investigates linguistic discrimination in large language models by evaluating safety and output quality across 74 languages using AdvBench and NQ on four models, revealing substantial cross-language disparities especially for low-resource languages. It introduces LDFighter, a lightweight, plug-and-play mitigation that translates queries into multiple languages, runs them through the target model, and uses similarity-based voting on English translations to select a final answer, thereby improving cross-language consistency. Empirical results show significant reductions in multilingual jailbreak rates () and improved scores across models, with runtime costs that scale with the number of languages but remain practical. The approach demonstrates strong potential for delivering fairer, more reliable multilingual AI services without modifying the underlying LLMs.

Abstract

By training on text in various languages, large language models (LLMs) typically possess multilingual support and demonstrate remarkable capabilities in solving tasks described in different languages. However, LLMs can exhibit linguistic discrimination due to the uneven distribution of training data across languages. That is, LLMs are hard to keep the consistency of responses when faced with the same task but depicted in different languages. In this study, we first explore the consistency in the LLMs' outputs responding to queries in various languages from two aspects: safety and quality. We conduct this analysis with two datasets (AdvBench and NQ) based on four LLMs (Llama2-13b, Gemma-7b, GPT-3.5-turbo and Gemini-pro). The results show that LLMs exhibit stronger human alignment capabilities with queries in English, French, Russian, and Spanish (only 1.04\% of harmful queries successfully jailbreak on average) compared to queries in Bengali, Georgian, Nepali and Maithili (27.7\% of harmful queries jailbreak successfully on average). Moreover, for queries in English, Danish, Czech and Slovenian, LLMs tend to produce responses with a higher quality (with 0.1494 score on average) compared to the other languages. Upon these findings, we propose LDFighter, a similarity-based voting, to mitigate the linguistic discrimination in LLMs. LDFighter ensures consistent service for different language speakers. We evaluate LDFighter with both benign queries and harmful queries. The results show that LDFighter not only significantly reduces the jailbreak success rate but also improve the response quality on average, demonstrating its effectiveness.
Paper Structure (16 sections, 7 equations, 11 figures, 1 table, 2 algorithms)

This paper contains 16 sections, 7 equations, 11 figures, 1 table, 2 algorithms.

Figures (11)

  • Figure 1: Sample linguistic discrimination in ChatGPT
  • Figure 2: Overall framework of LDFighter.
  • Figure 3: Average MJR for different LLMs on harmful questions.
  • Figure 4: Variance of LJR across different languages.
  • Figure 5: LJR for different languages on vanilla harmful questions.
  • ...and 6 more figures