Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor

Ahmed Sharshar; Hosam Elgendy; Saad El Dine Ahmed; Yasser Rohaim; Yuxia Wang

Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor

Ahmed Sharshar, Hosam Elgendy, Saad El Dine Ahmed, Yasser Rohaim, Yuxia Wang

Abstract

Dark humor often relies on subtle cultural nuances and implicit cues that require contextual reasoning to interpret, posing safety challenges that current static benchmarks fail to capture. To address this, we introduce a novel multimodal, multilingual benchmark for detecting and understanding harmful and offensive humor. Our manually curated dataset comprises 3,000 texts and 6,000 images in English and Arabic, alongside 1,200 videos that span English, Arabic, and language-independent (universal) contexts. Unlike standard toxicity datasets, we enforce a strict annotation guideline: distinguishing Safe jokes from Harmful ones, with the latter further classified into Explicit (overt) and Implicit (Covert) categories to probe deep reasoning. We systematically evaluate state-of-the-art (SOTA) open and closed-source models across all modalities. Our findings reveal that closed-source models significantly outperform open-source ones, with a notable difference in performance between the English and Arabic languages in both, underscoring the critical need for culturally grounded, reasoning-aware safety alignment. Warning: this paper contains example data that may be offensive, harmful, or biased.

Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor

Abstract

Paper Structure (50 sections, 3 figures, 9 tables)

This paper contains 50 sections, 3 figures, 9 tables.

Introduction
Related Work
Datasets
Understanding (Dark) Humor by LLMs/VLMs
Dataset
Textual Data
Image Data
Video Data
Methodology
Results and Analysis
Textual Modality Evaluation
Closed-Source Models
Open-Source Models
Arabic-Specific Models
Image Modality Evaluation
...and 35 more sections

Figures (3)

Figure 1: Representative examples of the image modality in English and Arabic. We illustrate the distinction between implicit harmful (requiring reasoning), explicit harmful (containing plain toxicity), and Safe content.
Figure 2: Sample video frames for the Implicit harmful category across languages.
Figure 3: Harmful accuracy breakdown by model, language, and explicitness. Different markers represent Implicit and Explicit harm.

Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor

Abstract

Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor

Authors

Abstract

Table of Contents

Figures (3)