Table of Contents
Fetching ...

MasonPerplexity at Multimodal Hate Speech Event Detection 2024: Hate Speech and Target Detection Using Transformer Ensembles

Amrita Ganguly, Al Nahian Bin Emran, Sadiya Sayara Chowdhury Puspo, Md Nishat Raihan, Dhiman Goswami, Marcos Zampieri

TL;DR

This study tackles multimodal hate speech detection in political contexts by leveraging transformer models: XLM-RoBERTa-large for hate-speech detection (sub-task A) and a three-model ensemble (XLM-RoBERTa-base, BERTweet-large, BERT-base) for target detection in text-embedded images (sub-task B). It employs OCR to extract embedded text, uses back-translation to mitigate class imbalance, and applies majority voting to ensemble predictions, achieving 0.8347 F1 on sub-task A and 0.6741 F1 on sub-task B (rank 3rd on both). GPT-3.5 variants are explored but generally underperform compared to fine-tuned transformers, highlighting the strength of multilingual Transformers for multimodal hate speech tasks. The work provides practical insights into data augmentation, model fusion, and bias considerations essential for advancing automated moderation in multimodal online discourse.

Abstract

The automatic identification of offensive language such as hate speech is important to keep discussions civil in online communities. Identifying hate speech in multimodal content is a particularly challenging task because offensiveness can be manifested in either words or images or a juxtaposition of the two. This paper presents the MasonPerplexity submission for the Shared Task on Multimodal Hate Speech Event Detection at CASE 2024 at EACL 2024. The task is divided into two sub-tasks: sub-task A focuses on the identification of hate speech and sub-task B focuses on the identification of targets in text-embedded images during political events. We use an XLM-roBERTa-large model for sub-task A and an ensemble approach combining XLM-roBERTa-base, BERTweet-large, and BERT-base for sub-task B. Our approach obtained 0.8347 F1-score in sub-task A and 0.6741 F1-score in sub-task B ranking 3rd on both sub-tasks.

MasonPerplexity at Multimodal Hate Speech Event Detection 2024: Hate Speech and Target Detection Using Transformer Ensembles

TL;DR

This study tackles multimodal hate speech detection in political contexts by leveraging transformer models: XLM-RoBERTa-large for hate-speech detection (sub-task A) and a three-model ensemble (XLM-RoBERTa-base, BERTweet-large, BERT-base) for target detection in text-embedded images (sub-task B). It employs OCR to extract embedded text, uses back-translation to mitigate class imbalance, and applies majority voting to ensemble predictions, achieving 0.8347 F1 on sub-task A and 0.6741 F1 on sub-task B (rank 3rd on both). GPT-3.5 variants are explored but generally underperform compared to fine-tuned transformers, highlighting the strength of multilingual Transformers for multimodal hate speech tasks. The work provides practical insights into data augmentation, model fusion, and bias considerations essential for advancing automated moderation in multimodal online discourse.

Abstract

The automatic identification of offensive language such as hate speech is important to keep discussions civil in online communities. Identifying hate speech in multimodal content is a particularly challenging task because offensiveness can be manifested in either words or images or a juxtaposition of the two. This paper presents the MasonPerplexity submission for the Shared Task on Multimodal Hate Speech Event Detection at CASE 2024 at EACL 2024. The task is divided into two sub-tasks: sub-task A focuses on the identification of hate speech and sub-task B focuses on the identification of targets in text-embedded images during political events. We use an XLM-roBERTa-large model for sub-task A and an ensemble approach combining XLM-roBERTa-base, BERTweet-large, and BERT-base for sub-task B. Our approach obtained 0.8347 F1-score in sub-task A and 0.6741 F1-score in sub-task B ranking 3rd on both sub-tasks.
Paper Structure (10 sections, 5 figures, 5 tables)

This paper contains 10 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Training data example (Left: NO-HATE, Right: HATE)
  • Figure 2: Training data example (Left: Organization, Top-right: Individual, Bottom-right: Community)
  • Figure 3: Sample GPT-3.5 prompt.
  • Figure 4: Confusion matrix of sub-task A evaluation set.
  • Figure 5: Confusion matrix of sub-task B evaluation set.