Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights
Célia Nouri, Jean-Philippe Cointet, Chloé Clavel
TL;DR
The paper tackles abusive language detection in social media by explicitly leveraging conversational context through graph representations of Reddit threads. It introduces an affordance-based graph construction and a Graph Attention Network that aggregates localized contextual cues, demonstrating superior performance over context-agnostic and flattened-context baselines on the Contextual Abuse Dataset (CAD). Key contributions include (i) a graph-based ALD framework tailored to Reddit, (ii) identification of an optimal three-hop context window, and (iii) a comparative analysis showing the advantages of structured conversational topology for disambiguation. The approach is scalable, efficient, and offers interpretable insights into which contextual cues drive predictions, underscoring the practical potential for real-world moderation and future exploration of richer, multimodal and cross-platform ALD systems.
Abstract
Detecting abusive language in social media conversations poses significant challenges, as identifying abusiveness often depends on the conversational context, characterized by the content and topology of preceding comments. Traditional Abusive Language Detection (ALD) models often overlook this context, which can lead to unreliable performance metrics. Recent Natural Language Processing (NLP) methods that integrate conversational context often depend on limited and simplified representations, and report inconsistent results. In this paper, we propose a novel approach that utilize graph neural networks (GNNs) to model social media conversations as graphs, where nodes represent comments, and edges capture reply structures. We systematically investigate various graph representations and context windows to identify the optimal configuration for ALD. Our GNN model outperform both context-agnostic baselines and linear context-aware methods, achieving significant improvements in F1 scores. These findings demonstrate the critical role of structured conversational context and establish GNNs as a robust framework for advancing context-aware abusive language detection.
