Bridging the Multilingual Safety Divide: Efficient, Culturally-Aware Alignment for Global South Languages
Somnath Banerjee, Rima Hazra, Animesh Mukherjee
TL;DR
This work investigates the multilingual safety divide faced by Global South languages, highlighting how English-centric safety pipelines underperform in low-resource and code-mixed settings. It synthesizes findings from XThreatBench, cultural harm evaluation, code-mixed safety, and multilingual knowledge edits to propose a practical, resource-aware alignment agenda. Key contributions include language-specific, parameter-efficient safety steering by tuning a small subset of heads, culturally grounded evaluation and data, and participatory alignment, plus auditing to prevent English-only upgrades. The proposed approach demonstrates that robust multilingual safety can be achieved under limited compute, enabling equitable AI in underrepresented regions.
Abstract
Large language models (LLMs) are being deployed across the Global South, where everyday use involves low-resource languages, code-mixing, and culturally specific norms. Yet safety pipelines, benchmarks, and alignment still largely target English and a handful of high-resource languages, implicitly assuming safety and factuality ''transfer'' across languages. Evidence increasingly shows they do not. We synthesize recent findings indicating that (i) safety guardrails weaken sharply on low-resource and code-mixed inputs, (ii) culturally harmful behavior can persist even when standard toxicity scores look acceptable, and (iii) English-only knowledge edits and safety patches often fail to carry over to low-resource languages. In response, we outline a practical agenda for researchers and students in the Global South: parameter-efficient safety steering, culturally grounded evaluation and preference data, and participatory workflows that empower local communities to define and mitigate harm. Our aim is to make multilingual safety a core requirement-not an add-on-for equitable AI in underrepresented regions.
