Detecting Hate Speech in Social Media
Shervin Malmasi, Marcos Zampieri
TL;DR
This paper tackles detecting hate speech in social media and distinguishing it from general profanity using a three-class Twitter dataset. The authors apply a linear SVM with three feature groups—character n-grams, word n-grams, and word skip-grams—to establish a lexical baseline evaluated with 10-fold cross-validation. The best single-feature model (character 4-grams) achieves 78% accuracy, while the Hate class is the hardest to classify and often confused with Offensive content. The work highlights the challenge of separating hate speech from profanity and proposes directions for ensemble methods and feature analyses in future research.
Abstract
In this paper we examine methods to detect hate speech in social media, while distinguishing this from general profanity. We aim to establish lexical baselines for this task by applying supervised classification methods using a recently released dataset annotated for this purpose. As features, our system uses character n-grams, word n-grams and word skip-grams. We obtain results of 78% accuracy in identifying posts across three classes. Results demonstrate that the main challenge lies in discriminating profanity and hate speech from each other. A number of directions for future work are discussed.
