Table of Contents
Fetching ...

SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore

Ri Chi Ng, Nirmalendu Prakash, Ming Shan Hee, Kenny Tsu Wei Choo, Roy Ka-Wei Lee

TL;DR

SGHateCheck extends HateCheck and Multilingual HateCheck to Singapore and Southeast Asia by building a large, language-specific functional-testing framework spanning Singlish, Mandarin, Malay, and Tamil. It translates and refines templates with LLMs and native annotators, producing 21,152 test cases (15,052 hateful, 6,100 non-hateful) and validating them with cross-language annotation. Benchmarking several open-source LLMs (via EngSet and MultiSet) reveals systematic biases toward non-hate classifications, language-dependent performance, and gaps in nuanced hate expressions, particularly in denunciations and quotations across Tamil and Mandarin. The study highlights the importance of localized data and functional testing to improve robustness and safety of HS detection in diverse linguistic contexts, informing future research and development of region-specific moderation tools.

Abstract

To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native annotators. \textsf{SGHateCheck} reveals critical flaws in state-of-the-art models, highlighting their inadequacy in sensitive content moderation. This work aims to foster the development of more effective hate speech detection tools for diverse linguistic environments, particularly for Singapore and Southeast Asia contexts.

SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore

TL;DR

SGHateCheck extends HateCheck and Multilingual HateCheck to Singapore and Southeast Asia by building a large, language-specific functional-testing framework spanning Singlish, Mandarin, Malay, and Tamil. It translates and refines templates with LLMs and native annotators, producing 21,152 test cases (15,052 hateful, 6,100 non-hateful) and validating them with cross-language annotation. Benchmarking several open-source LLMs (via EngSet and MultiSet) reveals systematic biases toward non-hate classifications, language-dependent performance, and gaps in nuanced hate expressions, particularly in denunciations and quotations across Tamil and Mandarin. The study highlights the importance of localized data and functional testing to improve robustness and safety of HS detection in diverse linguistic contexts, informing future research and development of region-specific moderation tools.

Abstract

To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native annotators. \textsf{SGHateCheck} reveals critical flaws in state-of-the-art models, highlighting their inadequacy in sensitive content moderation. This work aims to foster the development of more effective hate speech detection tools for diverse linguistic environments, particularly for Singapore and Southeast Asia contexts.
Paper Structure (53 sections, 9 tables)