Table of Contents
Fetching ...

Read as You See: Guiding Unimodal LLMs for Low-Resource Explainable Harmful Meme Detection

Fengjun Pan, Xiaobao Wu, Tho Quan, Anh Tuan Luu

TL;DR

This work addresses harmful meme detection under low-resource constraints by converting multimodal memes into high-fidelity textual descriptions $D_h$ using a High-Fidelity Meme2Text pipeline that leverages lightweight LMMs, allowing unimodal LLMs to reason on text. It then applies Unimodal Guided CoT Prompting with human-crafted guidelines to produce transparent classifications and rationales, enabling adaptable, context-sensitive moderation. Across seven benchmark datasets, U-CoT+ achieves competitive zero-shot performance relative to resource-intensive baselines, often matching or surpassing GPT-4o-mini, while offering improved explainability and efficiency. The framework thus provides a scalable, adaptable approach to explainable harmful meme detection suitable for low-resource deployment and real-world moderation.

Abstract

Detecting harmful memes is crucial for safeguarding the integrity and harmony of online environments, yet existing detection methods are often resource-intensive, inflexible, and lacking explainability, limiting their applicability in assisting real-world web content moderation. We propose U-CoT+, a resource-efficient framework that prioritizes accessibility, flexibility and transparency in harmful meme detection by fully harnessing the capabilities of lightweight unimodal large language models (LLMs). Instead of directly prompting or fine-tuning large multimodal models (LMMs) as black-box classifiers, we avoid immediate reasoning over complex visual inputs but decouple meme content recognition from meme harmfulness analysis through a high-fidelity meme-to-text pipeline, which collaborates lightweight LMMs and LLMs to convert multimodal memes into natural language descriptions that preserve critical visual information, thus enabling text-only LLMs to "see" memes by "reading". Grounded in textual inputs, we further guide unimodal LLMs' reasoning under zero-shot Chain-of-Thoughts (CoT) prompting with targeted, interpretable, context-aware, and easily obtained human-crafted guidelines, thus providing accountable step-by-step rationales, while enabling flexible and efficient adaptation to diverse sociocultural criteria of harmfulness. Extensive experiments on seven benchmark datasets show that U-CoT+ achieves performance comparable to resource-intensive baselines, highlighting its effectiveness and potential as a scalable, explainable, and low-resource solution to support harmful meme detection.

Read as You See: Guiding Unimodal LLMs for Low-Resource Explainable Harmful Meme Detection

TL;DR

This work addresses harmful meme detection under low-resource constraints by converting multimodal memes into high-fidelity textual descriptions using a High-Fidelity Meme2Text pipeline that leverages lightweight LMMs, allowing unimodal LLMs to reason on text. It then applies Unimodal Guided CoT Prompting with human-crafted guidelines to produce transparent classifications and rationales, enabling adaptable, context-sensitive moderation. Across seven benchmark datasets, U-CoT+ achieves competitive zero-shot performance relative to resource-intensive baselines, often matching or surpassing GPT-4o-mini, while offering improved explainability and efficiency. The framework thus provides a scalable, adaptable approach to explainable harmful meme detection suitable for low-resource deployment and real-world moderation.

Abstract

Detecting harmful memes is crucial for safeguarding the integrity and harmony of online environments, yet existing detection methods are often resource-intensive, inflexible, and lacking explainability, limiting their applicability in assisting real-world web content moderation. We propose U-CoT+, a resource-efficient framework that prioritizes accessibility, flexibility and transparency in harmful meme detection by fully harnessing the capabilities of lightweight unimodal large language models (LLMs). Instead of directly prompting or fine-tuning large multimodal models (LMMs) as black-box classifiers, we avoid immediate reasoning over complex visual inputs but decouple meme content recognition from meme harmfulness analysis through a high-fidelity meme-to-text pipeline, which collaborates lightweight LMMs and LLMs to convert multimodal memes into natural language descriptions that preserve critical visual information, thus enabling text-only LLMs to "see" memes by "reading". Grounded in textual inputs, we further guide unimodal LLMs' reasoning under zero-shot Chain-of-Thoughts (CoT) prompting with targeted, interpretable, context-aware, and easily obtained human-crafted guidelines, thus providing accountable step-by-step rationales, while enabling flexible and efficient adaptation to diverse sociocultural criteria of harmfulness. Extensive experiments on seven benchmark datasets show that U-CoT+ achieves performance comparable to resource-intensive baselines, highlighting its effectiveness and potential as a scalable, explainable, and low-resource solution to support harmful meme detection.

Paper Structure

This paper contains 33 sections, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Illustration of previous methods and our U-CoT+. Previous methods follow either fully supervised or low-resource settings, fine-tuning LMMs/PLMs with labeled data or prompting advanced LMMs (e.g., GPT-4) with/without few-shot examples or retrieval-augmented mechanisms. They do not necessarily guarantee predictions that include explicit reasoning. U-CoT+ employs a High-fidelity Meme2Text pipeline to convert the multimodal harmful meme detection task into a unimodal, text-only setting, and further enhances LLMs' reasoning through Unimodal Guided CoT Prompting. An example output given by Qwen2.5-14B under U-CoT+ is presented in "Step-by-step Reasoning". The meme is only for demonstration purposes.
  • Figure 2: Confusion matrices of Qwen2.5-14B based on meme descriptions sourced from different 7B LMMs.
  • Figure 3: Comparing LLM rationales before and after applying context-specific guidelines. Incorrect. Correct.
  • Figure 4: Examples of incorrectly classified memes and their corresponding error type.
  • Figure 5: Our proposed High-fidelity Meme2Text pipeline.
  • ...and 11 more figures