Indicators for characterising online hate speech and its automatic detection
Erica Forzinetti, Marco L. Della Vedova, Stefano Pasta, Milena Santerini
TL;DR
This study introduces the Spectrum of Online Hate Indicators to characterize hate content and evaluates automatic detection against expert pedagogist annotations on Italian Twitter. It analyzes four target-group datasets (Jews, Muslims, Roma, immigrants), annotating 3,600 tweets (900 per dataset) for seven indicators and comparing them to four Italian-language classifiers (including an Evalita-2020 model, Dehatebert-mono-italian, POLIticBERT, Neuraly). The results show ML detectors reliably pick up high-intensity signals like incitement and violent intent but struggle with contextual interpretation, underscoring the need for better contextualization and balanced data. The work demonstrates the value of interdisciplinary collaboration and careful data design for practical and trustworthy online hate detection systems.
Abstract
We examined four case studies in the context of hate speech on Twitter in Italian from 2019 to 2020, aiming at comparing the classification of the 3,600 tweets made by expert pedagogists with the automatic classification made by machine learning algorithms. Pedagogists used a novel classification scheme based on seven indicators that characterize hate. These indicators are: the content is public, it affects a target group, it contains hate speech in explicit verbal form, it will not redeem, it has intention to harm, it can have a possible violent response, it incites hatred and violence. The case studies refer to Jews, Muslims, Roma, and immigrants target groups. We find that not all the types of hateful content are equally detectable by the machine learning algorithms that we considered. In particular, algorithms perform better in identifying tweets that incite hatred and violence, and those that can have possible violent response.
