SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia

Ri Chi Ng; Aditi Kumaresan; Yujia Hu; Roy Ka-Wei Lee

SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia

Ri Chi Ng, Aditi Kumaresan, Yujia Hu, Roy Ka-Wei Lee

Abstract

Hate speech detection relies heavily on linguistic resources, which are primarily available in high-resource languages such as English and Chinese, creating barriers for researchers and platforms developing tools for low-resource languages in Southeast Asia, where diverse socio-linguistic contexts complicate online hate moderation. To address this, we introduce SEAHateCheck, a pioneering dataset tailored to Indonesia, Thailand, the Philippines, and Vietnam, covering Indonesian, Tagalog, Thai, and Vietnamese. Building on HateCheck's functional testing framework and refining SGHateCheck's methods, SEAHateCheck provides culturally relevant test cases, augmented by large language models and validated by local experts for accuracy. Experiments with state-of-the-art and multilingual models revealed limitations in detecting hate speech in specific low-resource languages. In particular, Tagalog test cases showed the lowest model accuracy, likely due to linguistic complexity and limited training data. In contrast, slang-based functional tests proved the hardest, as models struggled with culturally nuanced expressions. The diagnostic insights of SEAHateCheck further exposed model weaknesses in implicit hate detection and models' struggles with counter-speech expression. As the first functional test suite for these Southeast Asian languages, this work equips researchers with a robust benchmark, advancing the development of practical, culturally attuned hate speech detection tools for inclusive online content moderation.

SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia

Abstract

Paper Structure (44 sections, 19 figures, 85 tables)

This paper contains 44 sections, 19 figures, 85 tables.

Introduction
Constructing SEAHateCheck Dataset
Defining Hate Speech
Defining Functional Tests
Selecting Functional Tests
Translating Templates
Generating and Validating Gold Label Test Cases
Generating and Validating Silver Label Test Cases
Benchmarking LLMs on SEAHateCheck
LLM Fine-tuning
Discussion on Gold Label Test Cases
Overall Results
Performance across Functional Tests
Performance across Protected Categories
Discussion on Silver Label Test Cases
...and 29 more sections

Figures (19)

Figure 1: Accuracy across Functional Tests for Indonesian (left) and Tagalog (right)
Figure 2: F1 Score across Protected Categories on Indonesian (left) and Taglog (right)
Figure 3: Accuracy across Silver Functional Tests for Indonesian (left) and Tagalog (right)
Figure 4: F1 Score across Protected Categories for Silver Indonesian (left) and Taglog (right)
Figure 5: Functional Test Description and Example for F1 to F15
...and 14 more figures

SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia

Abstract

SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia

Authors

Abstract

Table of Contents

Figures (19)