Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media

Liam Hebert; Gaurav Sahu; Yuxuan Guo; Nanda Kishore Sreenivas; Lukasz Golab; Robin Cohen

Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media

Liam Hebert, Gaurav Sahu, Yuxuan Guo, Nanda Kishore Sreenivas, Lukasz Golab, Robin Cohen

TL;DR

The paper tackles hate speech detection in online discussions by moving beyond single-comment text to a holistic, multi-modal analysis that includes images and the surrounding discussion graph. It introduces the Multi-Modal Discussion Transformer (mDT), which interleaves modality fusion with graph Transformer layers through bottleneck tokens and employs a hierarchical spatial encoding based on Cantor’s pairing to capture discussion structure. A new benchmark, HatefulDiscussions, comprises 8266 Reddit discussions with 18359 labelled comments across 850 communities, enabling evaluation of complete multi-modal discussion graphs. Empirically, mDT outperforms text-only and prior graph-based baselines, with strong gains in accuracy and F1, and ablations show the critical roles of images and contextual graph information. The work advances robust, context-aware hate speech detection and provides a foundation for future multi-modal, discourse-grounded models with public data and code release under permissive licenses.

Abstract

We present the Multi-Modal Discussion Transformer (mDT), a novel methodfor detecting hate speech in online social networks such as Reddit discussions. In contrast to traditional comment-only methods, our approach to labelling a comment as hate speech involves a holistic analysis of text and images grounded in the discussion context. This is done by leveraging graph transformers to capture the contextual relationships in the discussion surrounding a comment and grounding the interwoven fusion layers that combine text and image embeddings instead of processing modalities separately. To evaluate our work, we present a new dataset, HatefulDiscussions, comprising complete multi-modal discussions from multiple online communities on Reddit. We compare the performance of our model to baselines that only process individual comments and conduct extensive ablation studies.

Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media

TL;DR

Abstract

Paper Structure (20 sections, 5 equations, 5 figures, 8 tables)

This paper contains 20 sections, 5 equations, 5 figures, 8 tables.

Introduction
Related Work
Methodology
Multi-Modal Discussion Transformer (mDT)
Initial Pre-Fusion
Modality Fusion
Graph Transformer
HatefulDiscussions Dataset
Results
Experimental Setup
Text-only Methods vs. Discussion Transformers
Effect of Bottleneck Size
Effect of Constrained Graph Attention
Effect of Fusion Layers
Effect of Images
...and 5 more sections

Figures (5)

Figure 1: Multi-Modal Discussion Transformer
Figure 2: Example Discussion Structure. Each node in the discussion tree represents a comment. The shortest distance between (a, c) and (b, d) is equivalent, demonstrating a lack of expressiveness towards hierarchy.
Figure 3: Fine-grained distribution of BERT and mDT misclassification. (Acronyms above as in Table \ref{['tab:dist_labels']})
Figure 4: An image present in the discussion context of example 3 (Table \ref{['tab:BERT_vs_graph_examples']}), seen only by mDT, contextualizing comments as potentially hateful
Figure 5: An image present in discussion context of example 2 (Table \ref{['tab:BERT_vs_graph_examples']}), seen only by mDT, contextualizing comments as potentially hateful

Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media

TL;DR

Abstract

Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media

Authors

TL;DR

Abstract

Table of Contents

Figures (5)