AggregHate: An Efficient Aggregative Approach for the Detection of Hatemongers on Social Platforms
Tom Marzea, Abraham Israeli, Oren Tsur
TL;DR
This paper tackles hate-monger detection on social platforms by moving from post-centric detection to a user-centric, multimodal framework that fuses utterance-level predictions with social-context signals. It presents a formal aggregation hierarchy, including naive, relational, distributional, and multimodal schemes, and benchmarks these against diffusion- and GNN-based baselines. Evaluations on Echo (Twitter), Gab, and Parler show that incorporating network context and distributional cues yields substantial improvements over text-only methods, with platform-specific patterns in the effectiveness of relational versus distributional components. The work also contributes a Parler user-annotated dataset and discusses practical considerations for deployment, including ethical safeguards and the importance of human-in-the-loop interventions for high-stakes moderation.
Abstract
Automatic detection of online hate speech serves as a crucial step in the detoxification of the online discourse. Moreover, accurate classification can promote a better understanding of the proliferation of hate as a social phenomenon. While most prior work focus on the detection of hateful utterances, we argue that focusing on the user level is as important, albeit challenging. In this paper we consider a multimodal aggregative approach for the detection of hate-mongers, taking into account the potentially hateful texts, user activity, and the user network. We evaluate our methods on three unique datasets X (Twitter), Gab, and Parler showing that a processing a user's texts in her social context significantly improves the detection of hate mongers, compared to previously used text and graph-based methods. Our method can be then used to improve the classification of coded messages, dog-whistling, and racial gas-lighting, as well as inform intervention measures. Moreover, our approach is highly efficient even for very large datasets and networks.
