Table of Contents
Fetching ...

Detecting Hate Speech in Social Media

Shervin Malmasi, Marcos Zampieri

TL;DR

This paper tackles detecting hate speech in social media and distinguishing it from general profanity using a three-class Twitter dataset. The authors apply a linear SVM with three feature groups—character n-grams, word n-grams, and word skip-grams—to establish a lexical baseline evaluated with 10-fold cross-validation. The best single-feature model (character 4-grams) achieves 78% accuracy, while the Hate class is the hardest to classify and often confused with Offensive content. The work highlights the challenge of separating hate speech from profanity and proposes directions for ensemble methods and feature analyses in future research.

Abstract

In this paper we examine methods to detect hate speech in social media, while distinguishing this from general profanity. We aim to establish lexical baselines for this task by applying supervised classification methods using a recently released dataset annotated for this purpose. As features, our system uses character n-grams, word n-grams and word skip-grams. We obtain results of 78% accuracy in identifying posts across three classes. Results demonstrate that the main challenge lies in discriminating profanity and hate speech from each other. A number of directions for future work are discussed.

Detecting Hate Speech in Social Media

TL;DR

This paper tackles detecting hate speech in social media and distinguishing it from general profanity using a three-class Twitter dataset. The authors apply a linear SVM with three feature groups—character n-grams, word n-grams, and word skip-grams—to establish a lexical baseline evaluated with 10-fold cross-validation. The best single-feature model (character 4-grams) achieves 78% accuracy, while the Hate class is the hardest to classify and often confused with Offensive content. The work highlights the challenge of separating hate speech from profanity and proposes directions for ensemble methods and feature analyses in future research.

Abstract

In this paper we examine methods to detect hate speech in social media, while distinguishing this from general profanity. We aim to establish lexical baselines for this task by applying supervised classification methods using a recently released dataset annotated for this purpose. As features, our system uses character n-grams, word n-grams and word skip-grams. We obtain results of 78% accuracy in identifying posts across three classes. Results demonstrate that the main challenge lies in discriminating profanity and hate speech from each other. A number of directions for future work are discussed.

Paper Structure

This paper contains 10 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Learning curve for a character $4$-gram model, with standard deviation highlighted. Accuracy does not plateau with the maximal data size.
  • Figure 2: Confusion matrix of the character 4-gram model for our $3$ classes. The heatmap represents the proportion of correctly classified examples in each class (this is normalized as the data distribution is imbalanced). The raw numbers are also reported within each cell. We note that the Hate class is the hardest to classify and is highly confused with the Offensive class.