The Multilingual Divide and Its Impact on Global AI Safety
Aidan Peppin, Julia Kreutzer, Alice Schoenauer Sebag, Kelly Marchisio, Beyza Ermis, John Dang, Samuel Cahyawijaya, Shivalika Singh, Seraphina Goldfarb-Tarrant, Viraat Aryabumi, Aakanksha, Wei-Yin Ko, Ahmet Üstün, Matthias Gallé, Marzieh Fadaee, Sara Hooker
TL;DR
The paper tackles the global language gap in AI safety, arguing that current LLM progress is English-centric and inequitable across languages and cultures. It presents the Aya Initiative as a practical, scalable approach to expand language coverage through diverse data sources, multilingual evaluation, and collaborative governance. Key contributions include new multilingual datasets and evaluation suites (Global-MMLU, INCLUDE, Aya Red-teaming), techniques for multilingual safety (Safety Context Distillation), and evidence from Aya on data mix and model merging that improve safety and coverage. The work underscores policy imperatives: open multilingual datasets, transparent language coverage, cross-institutional collaboration, and improved compute access to ensure safe AI across all languages with diverse cultural contexts.
Abstract
Despite advances in large language model capabilities in recent years, a large gap remains in their capabilities and safety performance for many languages beyond a relatively small handful of globally dominant languages. This paper provides researchers, policymakers and governance experts with an overview of key challenges to bridging the "language gap" in AI and minimizing safety risks across languages. We provide an analysis of why the language gap in AI exists and grows, and how it creates disparities in global AI safety. We identify barriers to address these challenges, and recommend how those working in policy and governance can help address safety concerns associated with the language gap by supporting multilingual dataset creation, transparency, and research.
