Table of Contents
Fetching ...

GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse

Hongzhan Lin, Ziyang Luo, Bo Wang, Ruichao Yang, Jing Ma

TL;DR

GOAT-Bench introduces a comprehensive, meme-focused safety benchmark for large multimodal models, collecting 6,626 memes across five tasks (hatefulness, misogyny, offensiveness, sarcasm, harmfulness) to probe implicit vs. explicit abuse. The study systematically evaluates 11 LMMs, employing a fixed prompting template, CoT prompts, few-shot in-context learning, and a SelfAlign supervised fine-tuning approach, revealing persistent safety gaps and model-specific strengths. Key findings show GPT-4V generally leads in safety-related detection, yet no model consistently masters all subtleties of meme-based abuse, with CoT and ICL yielding mixed results. The work demonstrates that current LMM safety is insufficient for robust real-world deployment and provides GOAT-Bench as a public resource to guide future improvements in safety alignment and multimodal reasoning.

Abstract

The exponential growth of social media has profoundly transformed how information is created, disseminated, and absorbed, exceeding any precedent in the digital age. Regrettably, this explosion has also spawned a significant increase in the online abuse of memes. Evaluating the negative impact of memes is notably challenging, owing to their often subtle and implicit meanings, which are not directly conveyed through the overt text and image. In light of this, large multimodal models (LMMs) have emerged as a focal point of interest due to their remarkable capabilities in handling diverse multimodal tasks. In response to this development, our paper aims to thoroughly examine the capacity of various LMMs (e.g., GPT-4o) to discern and respond to the nuanced aspects of social abuse manifested in memes. We introduce the comprehensive meme benchmark, GOAT-Bench, comprising over 6K varied memes encapsulating themes such as implicit hate speech, sexism, and cyberbullying, etc. Utilizing GOAT-Bench, we delve into the ability of LMMs to accurately assess hatefulness, misogyny, offensiveness, sarcasm, and harmful content. Our extensive experiments across a range of LMMs reveal that current models still exhibit a deficiency in safety awareness, showing insensitivity to various forms of implicit abuse. We posit that this shortfall represents a critical impediment to the realization of safe artificial intelligence. The GOAT-Bench and accompanying resources are publicly accessible at https://goatlmm.github.io/, contributing to ongoing research in this vital field.

GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse

TL;DR

GOAT-Bench introduces a comprehensive, meme-focused safety benchmark for large multimodal models, collecting 6,626 memes across five tasks (hatefulness, misogyny, offensiveness, sarcasm, harmfulness) to probe implicit vs. explicit abuse. The study systematically evaluates 11 LMMs, employing a fixed prompting template, CoT prompts, few-shot in-context learning, and a SelfAlign supervised fine-tuning approach, revealing persistent safety gaps and model-specific strengths. Key findings show GPT-4V generally leads in safety-related detection, yet no model consistently masters all subtleties of meme-based abuse, with CoT and ICL yielding mixed results. The work demonstrates that current LMM safety is insufficient for robust real-world deployment and provides GOAT-Bench as a public resource to guide future improvements in safety alignment and multimodal reasoning.

Abstract

The exponential growth of social media has profoundly transformed how information is created, disseminated, and absorbed, exceeding any precedent in the digital age. Regrettably, this explosion has also spawned a significant increase in the online abuse of memes. Evaluating the negative impact of memes is notably challenging, owing to their often subtle and implicit meanings, which are not directly conveyed through the overt text and image. In light of this, large multimodal models (LMMs) have emerged as a focal point of interest due to their remarkable capabilities in handling diverse multimodal tasks. In response to this development, our paper aims to thoroughly examine the capacity of various LMMs (e.g., GPT-4o) to discern and respond to the nuanced aspects of social abuse manifested in memes. We introduce the comprehensive meme benchmark, GOAT-Bench, comprising over 6K varied memes encapsulating themes such as implicit hate speech, sexism, and cyberbullying, etc. Utilizing GOAT-Bench, we delve into the ability of LMMs to accurately assess hatefulness, misogyny, offensiveness, sarcasm, and harmful content. Our extensive experiments across a range of LMMs reveal that current models still exhibit a deficiency in safety awareness, showing insensitivity to various forms of implicit abuse. We posit that this shortfall represents a critical impediment to the realization of safe artificial intelligence. The GOAT-Bench and accompanying resources are publicly accessible at https://goatlmm.github.io/, contributing to ongoing research in this vital field.
Paper Structure (25 sections, 8 figures, 8 tables)

This paper contains 25 sections, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Performance on our GOAT-Bench of a broad range of representative LMMs, like CogVLM wang2023cogvlm, InstructBLIP Dai2023InstructBLIPTG, LLaVA-1.5 liu2023visual, MiniGPT-4 zhu2023minigpt, Qwen-VL bai2023qwen, and GPT-4V(ision) OpenAI2023GPT4TR. GPT-4V achieves the best overall performance from five different perspectives.
  • Figure 2: GOAT-Bench is a comprehensive dataset that tackles the five interwoven meme tasks.
  • Figure 3: The comparison among the overall macro-averaged F1 scores (%) of different LMMs with CoT prompts on the GOAT-Bench across different tasks.
  • Figure 4: Hateful example of wrongly predicted memes by GPT-4V with the explanation.
  • Figure 5: Non-hateful example of wrongly predicted memes by GPT-4V with the explanation.
  • ...and 3 more figures