Lost in Localization: Building RabakBench with Human-in-the-Loop Validation to Measure Multilingual Safety Gaps

Gabriel Chua; Leanne Tan; Ziyu Ge; Roy Ka-Wei Lee

Lost in Localization: Building RabakBench with Human-in-the-Loop Validation to Measure Multilingual Safety Gaps

Gabriel Chua, Leanne Tan, Ziyu Ge, Roy Ka-Wei Lee

TL;DR

RabakBench presents a localized multilingual safety benchmark and scalable Generate-Label-Translate pipeline tailored to Singapore's Singlish, Chinese, Malay, and Tamil. It combines adversarial data generation, weak supervision with high-agreement LLM annotators, and toxicity-preserving translation to produce a 5,364-entry parallel safety corpus across four languages, with 76.6% labeled unsafe. The evaluation of 13 guardrails reveals pronounced safety gaps and language-dependent degradation, especially for Tamil, underscoring the need for native-context training and context-aware evaluation. The work offers a reproducible framework for building and extending localized safety benchmarks and provides open-source data to advance multilingual AI safety research.

Abstract

Large language models (LLMs) often fail to maintain safety in low-resource language varieties, such as code-mixed vernaculars and regional dialects. We introduce RabakBench, a multilingual safety benchmark and scalable pipeline localized to Singapore's unique linguistic landscape, covering Singlish, Chinese, Malay, and Tamil. We construct the benchmark through a three-stage pipeline: (1) Generate: augmenting real-world unsafe web content via LLM-driven red teaming; (2) Label: applying semi-automated multi-label annotation using majority-voted LLM labelers; and (3) Translate: performing high-fidelity, toxicity-preserving translation. The resulting dataset contains over 5,000 examples across six fine-grained safety categories. Despite using LLMs for scalability, our framework maintains rigorous human oversight, achieving 0.70-0.80 inter-annotator agreement. Evaluations of 13 state-of-the-art guardrails reveal significant performance degradation, underscoring the need for localized evaluation. RabakBench provides a reproducible framework for building safety benchmarks in underserved communities.

Lost in Localization: Building RabakBench with Human-in-the-Loop Validation to Measure Multilingual Safety Gaps

TL;DR

Abstract

Lost in Localization: Building RabakBench with Human-in-the-Loop Validation to Measure Multilingual Safety Gaps

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)