Table of Contents
Fetching ...

Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models

Minh Duc Bui, Katharina von der Wense, Anne Lauscher

TL;DR

Multi3Hate introduces the first parallel multimodal and multilingual hate speech dataset annotated by a multicultural set of annotators, spanning 300 memes across five languages and five cultures. Through a three-stage pipeline—crawling, translation, and cross-cultural annotation—the authors demonstrate that cultural background significantly shapes hate speech judgments and that current vision-language models strongly align with US-centric annotations in zero-shot settings. The work also reveals that prompting language or adding country information has limited or negative impact on cross-cultural alignment, underscoring the risk of cultural bias in VLM-based moderation. These findings highlight the need for culturally aware evaluation and data-driven approaches to mitigate bias in global hate speech detection systems.

Abstract

Warning: this paper contains content that may be offensive or upsetting Hate speech moderation on global platforms poses unique challenges due to the multimodal and multilingual nature of content, along with the varying cultural perceptions. How well do current vision-language models (VLMs) navigate these nuances? To investigate this, we create the first multimodal and multilingual parallel hate speech dataset, annotated by a multicultural set of annotators, called Multi3Hate. It contains 300 parallel meme samples across 5 languages: English, German, Spanish, Hindi, and Mandarin. We demonstrate that cultural background significantly affects multimodal hate speech annotation in our dataset. The average pairwise agreement among countries is just 74%, significantly lower than that of randomly selected annotator groups. Our qualitative analysis indicates that the lowest pairwise label agreement-only 67% between the USA and India-can be attributed to cultural factors. We then conduct experiments with 5 large VLMs in a zero-shot setting, finding that these models align more closely with annotations from the US than with those from other cultures, even when the memes and prompts are presented in the dominant language of the other culture. Code and dataset are available at https://github.com/MinhDucBui/Multi3Hate.

Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models

TL;DR

Multi3Hate introduces the first parallel multimodal and multilingual hate speech dataset annotated by a multicultural set of annotators, spanning 300 memes across five languages and five cultures. Through a three-stage pipeline—crawling, translation, and cross-cultural annotation—the authors demonstrate that cultural background significantly shapes hate speech judgments and that current vision-language models strongly align with US-centric annotations in zero-shot settings. The work also reveals that prompting language or adding country information has limited or negative impact on cross-cultural alignment, underscoring the risk of cultural bias in VLM-based moderation. These findings highlight the need for culturally aware evaluation and data-driven approaches to mitigate bias in global hate speech detection systems.

Abstract

Warning: this paper contains content that may be offensive or upsetting Hate speech moderation on global platforms poses unique challenges due to the multimodal and multilingual nature of content, along with the varying cultural perceptions. How well do current vision-language models (VLMs) navigate these nuances? To investigate this, we create the first multimodal and multilingual parallel hate speech dataset, annotated by a multicultural set of annotators, called Multi3Hate. It contains 300 parallel meme samples across 5 languages: English, German, Spanish, Hindi, and Mandarin. We demonstrate that cultural background significantly affects multimodal hate speech annotation in our dataset. The average pairwise agreement among countries is just 74%, significantly lower than that of randomly selected annotator groups. Our qualitative analysis indicates that the lowest pairwise label agreement-only 67% between the USA and India-can be attributed to cultural factors. We then conduct experiments with 5 large VLMs in a zero-shot setting, finding that these models align more closely with annotations from the US than with those from other cultures, even when the memes and prompts are presented in the dominant language of the other culture. Code and dataset are available at https://github.com/MinhDucBui/Multi3Hate.

Paper Structure

This paper contains 58 sections, 11 figures, 14 tables.

Figures (11)

  • Figure 1: Our dataset creation process is divided into three stages: 1. Crawling Stage; 2. Translation Stage; and 3. Cross-Cultural Hate Speech Annotation Stage. The two examples illustrate the varying ways in which memes are annotated across different cultures.
  • Figure 2: Example of a parallel meme. The original English meme reads: "just in time <sep> for new year in cologne". Only in Germany is this meme perceived as hate speech.
  • Figure 3: We provide examples from each category with hate speech annotations, highlighting cultural variability in perceptions and challenges for annotators in identifying targeted groups and stereotypes.
  • Figure 4: (a) Pairwise label agreement for all countries, ranked by average agreement. (b) A comparison of the top two and bottom two country pairs' pairwise label agreement, along with the overall average across all countries, against randomly selected annotator groups. The results indicate that the lowest agreement pairs and the overall average differ significantly from random groups
  • Figure 5: Distribution of disagreements between the USA and India. See Table \ref{['tab:normalized_keywords']} in the Appendix for detailed information on each category along with examples.
  • ...and 6 more figures