LLM-C3MOD: A Human-LLM Collaborative System for Cross-Cultural Hate Speech Moderation
Junyeong Park, Seogyeong Jeong, Seyoung Song, Yohan Lee, Alice Oh
TL;DR
This work tackles cross-cultural hate speech moderation by non-native moderators, identifying that contextual cues are critical for accurate judgments. It introduces LLM-C3Mod, a three-stage pipeline that uses RAG-enabled cultural context generation, multi-LLM moderation, and targeted non-native human review to balance efficiency with nuanced understanding. Empirical results on a Korean hate speech dataset show 78% accuracy with the pipeline, exceeding a GPT-4o baseline of 71% and reducing human workload by 83.6%, with humans excelling on nuanced content like internet culture. The findings suggest that with properly structured LLM support, non-native moderators can effectively contribute to global online safety across cultures, though native-linguistic context remains advantageous in challenging cases.
Abstract
Content moderation is a global challenge, yet major tech platforms prioritize high-resource languages, leaving low-resource languages with scarce native moderators. Since effective moderation depends on understanding contextual cues, this imbalance increases the risk of improper moderation due to non-native moderators' limited cultural understanding. Through a user study, we identify that non-native moderators struggle with interpreting culturally-specific knowledge, sentiment, and internet culture in the hate speech moderation. To assist them, we present LLM-C3MOD, a human-LLM collaborative pipeline with three steps: (1) RAG-enhanced cultural context annotations; (2) initial LLM-based moderation; and (3) targeted human moderation for cases lacking LLM consensus. Evaluated on a Korean hate speech dataset with Indonesian and German participants, our system achieves 78% accuracy (surpassing GPT-4o's 71% baseline), while reducing human workload by 83.6%. Notably, human moderators excel at nuanced contents where LLMs struggle. Our findings suggest that non-native moderators, when properly supported by LLMs, can effectively contribute to cross-cultural hate speech moderation.
