From Native Memes to Global Moderation: Cros-Cultural Evaluation of Vision-Language Models for Hateful Meme Detection
Mo Wang, Kaixuan Ren, Pratik Jalan, Ahmed Ashraf, Tuong Vy Vu, Rahul Seetharaman, Shah Nawaz, Usman Naseem
TL;DR
This work interrogates how cultural context shapes hateful meme detection by evaluating state-of-the-art vision-language models across six native meme datasets. It introduces a multidimensional evaluation framework that jointly varies learning (zero-shot vs one-shot), prompt language (native vs English), and translation effects, using a translated-caption protocol to isolate language influence. The study finds that translate-then-detect pipelines underperform, while native-language prompting and one-shot learning substantially improve cross-cultural robustness, revealing a Western-centric bias in many VLMs. It further provides qualitative diagnostics of failure modes, such as semantic distortion and representation gaps, and proposes actionable strategies, including a hybrid deployment approach that combines locally fine-tuned models with general-purpose VLMs to achieve globally robust moderation. The results underscore the importance of culturally aware evaluation and intervention design for fair and effective global content moderation.
Abstract
Cultural context profoundly shapes how people interpret online content, yet vision-language models (VLMs) remain predominantly trained through Western or English-centric lenses. This limits their fairness and cross-cultural robustness in tasks like hateful meme detection. We introduce a systematic evaluation framework designed to diagnose and quantify the cross-cultural robustness of state-of-the-art VLMs across multilingual meme datasets, analyzing three axes: (i) learning strategy (zero-shot vs. one-shot), (ii) prompting language (native vs. English), and (iii) translation effects on meaning and detection. Results show that the common ``translate-then-detect'' approach deteriorate performance, while culturally aligned interventions - native-language prompting and one-shot learning - significantly enhance detection. Our findings reveal systematic convergence toward Western safety norms and provide actionable strategies to mitigate such bias, guiding the design of globally robust multimodal moderation systems.
