Table of Contents
Fetching ...

Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes

Rahul Garg, Trilok Padhi, Hemang Jain, Ugur Kursuncu, Ponnurangam Kumaraguru

TL;DR

This work addresses the challenge of detecting toxicity in multimodal memes by unifying external knowledge infusion with knowledge distillation. The KID-VLM framework combines a compact student VL encoder with a frozen LVLM teacher that provides implicit context via captions, and augments representations with ConceptNet-derived subgraphs through graph-based reasoning. A Relational Graph Convolutional Network processes the joint working graph, and a gated fusion mechanism integrates graph-derived signals with distilled multimodal features, optimized by a joint loss L_total = $\lambda_1 L_{\text{BCE}} + \lambda_2 L_{\text{KD}}$. Empirically, KID-VLM outperforms strong baselines on HatefulMemes and HarMeme, achieving higher F1 and AUC, while remaining efficient to train and deploy, thanks to distillation into a ~500M-parameter model and targeted multi-hop KG reasoning. This neurosymbolic approach advances scalable, context-aware toxicity detection for safer online environments.

Abstract

Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates Knowledge Distillation (KD) from Large Visual Language Models (LVLMs) and knowledge infusion to enhance the performance of toxicity detection in hateful memes. Our approach extracts sub-knowledge graphs from ConceptNet, a large-scale commonsense Knowledge Graph (KG) to be infused within a compact VLM framework. The relational context between toxic phrases in captions and memes, as well as visual concepts in memes enhance the model's reasoning capabilities. Experimental results from our study on two hate speech benchmark datasets demonstrate superior performance over the state-of-the-art baselines across AU-ROC, F1, and Recall with improvements of 1.1%, 7%, and 35%, respectively. Given the contextual complexity of the toxicity detection task, our approach showcases the significance of learning from both explicit (i.e. KG) as well as implicit (i.e. LVLMs) contextual cues incorporated through a hybrid neurosymbolic approach. This is crucial for real-world applications where accurate and scalable recognition of toxic content is critical for creating safer online environments.

Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes

TL;DR

This work addresses the challenge of detecting toxicity in multimodal memes by unifying external knowledge infusion with knowledge distillation. The KID-VLM framework combines a compact student VL encoder with a frozen LVLM teacher that provides implicit context via captions, and augments representations with ConceptNet-derived subgraphs through graph-based reasoning. A Relational Graph Convolutional Network processes the joint working graph, and a gated fusion mechanism integrates graph-derived signals with distilled multimodal features, optimized by a joint loss L_total = . Empirically, KID-VLM outperforms strong baselines on HatefulMemes and HarMeme, achieving higher F1 and AUC, while remaining efficient to train and deploy, thanks to distillation into a ~500M-parameter model and targeted multi-hop KG reasoning. This neurosymbolic approach advances scalable, context-aware toxicity detection for safer online environments.

Abstract

Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates Knowledge Distillation (KD) from Large Visual Language Models (LVLMs) and knowledge infusion to enhance the performance of toxicity detection in hateful memes. Our approach extracts sub-knowledge graphs from ConceptNet, a large-scale commonsense Knowledge Graph (KG) to be infused within a compact VLM framework. The relational context between toxic phrases in captions and memes, as well as visual concepts in memes enhance the model's reasoning capabilities. Experimental results from our study on two hate speech benchmark datasets demonstrate superior performance over the state-of-the-art baselines across AU-ROC, F1, and Recall with improvements of 1.1%, 7%, and 35%, respectively. Given the contextual complexity of the toxicity detection task, our approach showcases the significance of learning from both explicit (i.e. KG) as well as implicit (i.e. LVLMs) contextual cues incorporated through a hybrid neurosymbolic approach. This is crucial for real-world applications where accurate and scalable recognition of toxic content is critical for creating safer online environments.

Paper Structure

This paper contains 42 sections, 12 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Given a meme, we aim to derive the answer by joint reasoning over the knowledge from LVLM, the KG (green box), and reason over toxicity (red box).
  • Figure 2: KID-VLM framework: The framework unifies KD from an LVLM with KI from external KGs such as ConceptNet. The input image and text are processed through the CLIP encoders to generate embeddings, which are fused using different fusion mechanisms. ① Knowledge Extraction from Teacher Model. ② Multimodal Learning Framework. ③ Knowledge Extraction from KG. ④ Joint Reasoning Space: reasoning using the implicit knowledge from teacher model and explicit knowledge from KG for toxicity prediction.
  • Figure 3: Baseline (without KI/KD) vs. KID-VLM (with KI/KD) t-SNE plots illustrating the reduced 3D representation of the dataset after dimensionality reduction. The colors represent the ground truth labels of the data points. KID-VLM's plot shows a much clearer separation between the labeled data points.
  • Figure 4: Examples from the Hateful Memes Dataset
  • Figure 5: Examples from the HarMeme Dataset. The labels are given in the format [ Intensity , Target ] (Target label is not defined for not harmful memes)
  • ...and 5 more figures