Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior
Antigoni-Maria Founta, Constantinos Djouvas, Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Gianluca Stringhini, Athena Vakali, Michael Sirivianos, Nicolas Kourtellis
TL;DR
The paper tackles the difficulty of creating ground-truth labels for abusive behavior on Twitter by introducing an iterative crowdsourcing framework that refines label definitions and uses boosted sampling to balance rare abuse signals. It progresses from data collection and exploratory rounds to a large-scale annotation, culminating in an 80k-tweet labeled corpus with four practical categories (Abusive, Hateful, Normal, Spam) and a robust annotation platform. Key findings include strong inter-annotator agreement under the final schema, the insight that Cyberbullying is rarely useful in this context, and the value of boosted sampling for capturing minority classes. The work provides a replicable methodology, open-source tooling, and a valuable resource for researchers building abuse-detection systems and conducting large-scale crowdsourced labeling on social media data.
Abstract
In recent years, offensive, abusive and hateful language, sexism, racism and other types of aggressive and cyberbullying behavior have been manifesting with increased frequency, and in many online social media platforms. In fact, past scientific work focused on studying these forms in popular media, such as Facebook and Twitter. Building on such work, we present an 8-month study of the various forms of abusive behavior on Twitter, in a holistic fashion. Departing from past work, we examine a wide variety of labeling schemes, which cover different forms of abusive behavior, at the same time. We propose an incremental and iterative methodology, that utilizes the power of crowdsourcing to annotate a large scale collection of tweets with a set of abuse-related labels. In fact, by applying our methodology including statistical analysis for label merging or elimination, we identify a reduced but robust set of labels. Finally, we offer a first overview and findings of our collected and annotated dataset of 100 thousand tweets, which we make publicly available for further scientific exploration.
