Table of Contents
Fetching ...

Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights

Célia Nouri, Jean-Philippe Cointet, Chloé Clavel

TL;DR

The paper tackles abusive language detection in social media by explicitly leveraging conversational context through graph representations of Reddit threads. It introduces an affordance-based graph construction and a Graph Attention Network that aggregates localized contextual cues, demonstrating superior performance over context-agnostic and flattened-context baselines on the Contextual Abuse Dataset (CAD). Key contributions include (i) a graph-based ALD framework tailored to Reddit, (ii) identification of an optimal three-hop context window, and (iii) a comparative analysis showing the advantages of structured conversational topology for disambiguation. The approach is scalable, efficient, and offers interpretable insights into which contextual cues drive predictions, underscoring the practical potential for real-world moderation and future exploration of richer, multimodal and cross-platform ALD systems.

Abstract

Detecting abusive language in social media conversations poses significant challenges, as identifying abusiveness often depends on the conversational context, characterized by the content and topology of preceding comments. Traditional Abusive Language Detection (ALD) models often overlook this context, which can lead to unreliable performance metrics. Recent Natural Language Processing (NLP) methods that integrate conversational context often depend on limited and simplified representations, and report inconsistent results. In this paper, we propose a novel approach that utilize graph neural networks (GNNs) to model social media conversations as graphs, where nodes represent comments, and edges capture reply structures. We systematically investigate various graph representations and context windows to identify the optimal configuration for ALD. Our GNN model outperform both context-agnostic baselines and linear context-aware methods, achieving significant improvements in F1 scores. These findings demonstrate the critical role of structured conversational context and establish GNNs as a robust framework for advancing context-aware abusive language detection.

Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights

TL;DR

The paper tackles abusive language detection in social media by explicitly leveraging conversational context through graph representations of Reddit threads. It introduces an affordance-based graph construction and a Graph Attention Network that aggregates localized contextual cues, demonstrating superior performance over context-agnostic and flattened-context baselines on the Contextual Abuse Dataset (CAD). Key contributions include (i) a graph-based ALD framework tailored to Reddit, (ii) identification of an optimal three-hop context window, and (iii) a comparative analysis showing the advantages of structured conversational topology for disambiguation. The approach is scalable, efficient, and offers interpretable insights into which contextual cues drive predictions, underscoring the practical potential for real-world moderation and future exploration of richer, multimodal and cross-platform ALD systems.

Abstract

Detecting abusive language in social media conversations poses significant challenges, as identifying abusiveness often depends on the conversational context, characterized by the content and topology of preceding comments. Traditional Abusive Language Detection (ALD) models often overlook this context, which can lead to unreliable performance metrics. Recent Natural Language Processing (NLP) methods that integrate conversational context often depend on limited and simplified representations, and report inconsistent results. In this paper, we propose a novel approach that utilize graph neural networks (GNNs) to model social media conversations as graphs, where nodes represent comments, and edges capture reply structures. We systematically investigate various graph representations and context windows to identify the optimal configuration for ALD. Our GNN model outperform both context-agnostic baselines and linear context-aware methods, achieving significant improvements in F1 scores. These findings demonstrate the critical role of structured conversational context and establish GNNs as a robust framework for advancing context-aware abusive language detection.

Paper Structure

This paper contains 41 sections, 8 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Example conversation from the Contextual Abuse Dataset (CAD), the graph was generated from our Affordance-based method. The target node is labeled abusive and colored in orange.
  • Figure 2: Overall Model Architecture. Nodes represent text embedding representations. The yellow node is for our target comment, while the orange node is for the conversation context. x NoL stands for times Number of Layers. For readability, we did not represent all the edges going from the post node to all other nodes.
  • Figure 3: Node Distribution per Graph After Affordance-Based Trimming.
  • Figure 4: Example conversation graph with learned attention weights from the third layer of the best-performing GAT model. For readability, self-loop edges are omitted; their attention weights are one minus the sum of incoming edge weights.
  • Figure 5: Diagram of Reddit conversation graphs constructed using different edge methods. Node labels $t_i$ indicate comment publication times, with $t_i < t_j$ if $i < j$. From left to right: directed graph, undirected graph, and directed graph with temporal edges.
  • ...and 1 more figures