Table of Contents
Fetching ...

Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models

Cao Yuxuan, Wu Jiayang, Alistair Cheong Liang Chuen, Bryan Shan Guanrong, Theodore Lee Chong Jen, Sherman Chann Zhi Shen

TL;DR

This work addresses the challenge of moderating offensive memes in Singapore by developing a multilingual, multimodal pipeline that combines OCR, translation, and fine-tuned vision-language models. A large Singapore-focused meme dataset labeled by GPT-4V, along with Singapore-specific abbreviations and a multimodal Wikipedia resource, supports localized understanding. Through PEFT techniques like LoRA variants on LLaVA-NeXT-Mistral-7B and Qwen2-VL-7B-Instruct, the authors demonstrate strong moderation performance, achieving an AUROC of 0.8192 and 80.62% accuracy on held-out Singapore-context memes and releasing code, data, and models openly. The study highlights the importance of localized data and robust OCR/translation components for effective moderation and outlines directions for future improvements and broader deployment.

Abstract

Traditional online content moderation systems struggle to classify modern multimodal means of communication, such as memes, a highly nuanced and information-dense medium. This task is especially hard in a culturally diverse society like Singapore, where low-resource languages are used and extensive knowledge on local context is needed to interpret online content. We curate a large collection of 112K memes labeled by GPT-4V for fine-tuning a VLM to classify offensive memes in Singapore context. We show the effectiveness of fine-tuned VLMs on our dataset, and propose a pipeline containing OCR, translation and a 7-billion parameter-class VLM. Our solutions reach 80.62% accuracy and 0.8192 AUROC on a held-out test set, and can greatly aid human in moderating online contents. The dataset, code, and model weights have been open-sourced at https://github.com/aliencaocao/vlm-for-memes-aisg.

Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models

TL;DR

This work addresses the challenge of moderating offensive memes in Singapore by developing a multilingual, multimodal pipeline that combines OCR, translation, and fine-tuned vision-language models. A large Singapore-focused meme dataset labeled by GPT-4V, along with Singapore-specific abbreviations and a multimodal Wikipedia resource, supports localized understanding. Through PEFT techniques like LoRA variants on LLaVA-NeXT-Mistral-7B and Qwen2-VL-7B-Instruct, the authors demonstrate strong moderation performance, achieving an AUROC of 0.8192 and 80.62% accuracy on held-out Singapore-context memes and releasing code, data, and models openly. The study highlights the importance of localized data and robust OCR/translation components for effective moderation and outlines directions for future improvements and broader deployment.

Abstract

Traditional online content moderation systems struggle to classify modern multimodal means of communication, such as memes, a highly nuanced and information-dense medium. This task is especially hard in a culturally diverse society like Singapore, where low-resource languages are used and extensive knowledge on local context is needed to interpret online content. We curate a large collection of 112K memes labeled by GPT-4V for fine-tuning a VLM to classify offensive memes in Singapore context. We show the effectiveness of fine-tuned VLMs on our dataset, and propose a pipeline containing OCR, translation and a 7-billion parameter-class VLM. Our solutions reach 80.62% accuracy and 0.8192 AUROC on a held-out test set, and can greatly aid human in moderating online contents. The dataset, code, and model weights have been open-sourced at https://github.com/aliencaocao/vlm-for-memes-aisg.

Paper Structure

This paper contains 41 sections, 1 equation, 2 figures, 15 tables.

Figures (2)

  • Figure 1: LLaVA network architecture, from liu2023llava.
  • Figure 2: Pipeline