EkoHate: Abusive Language and Hate Speech Detection for Code-switched Political Discussions on Nigerian Twitter

Comfort Eseohen Ilevbare; Jesujoba O. Alabi; David Ifeoluwa Adelani; Firdous Damilola Bakare; Oluwatoyin Bunmi Abiola; Oluwaseyi Adesina Adeyemo

EkoHate: Abusive Language and Hate Speech Detection for Code-switched Political Discussions on Nigerian Twitter

Comfort Eseohen Ilevbare, Jesujoba O. Alabi, David Ifeoluwa Adelani, Firdous Damilola Bakare, Oluwatoyin Bunmi Abiola, Oluwaseyi Adesina Adeyemo

TL;DR

This work introduces EkoHate, a code-switched abusive language and hate speech dataset derived from Lagos gubernatorial election discourse to study online political conversation in Nigeria. By annotating 3,398 tweets with binary and four-label schemes and evaluating a domain-specific Twitter-RoBERTa model, the authors show strong binary performance ($= $95.1 F1) and substantial but more challenging multi-class results ($= $70.3 F1). They explore language-specific effects of code-switching, report cross-corpus transfer results with OLID, HateUS2020, and FountaHate, and perform error analysis to highlight the difficulty of detecting hate content in a multilingual, code-switched setting. The findings indicate that EkoHate generalizes to other political contexts and languages, and the dataset, along with its code, provides a useful benchmark for future research on hate/offensive speech in African multilingual social media contexts.

Abstract

Nigerians have a notable online presence and actively discuss political and topical matters. This was particularly evident throughout the 2023 general election, where Twitter was used for campaigning, fact-checking and verification, and even positive and negative discourse. However, little or none has been done in the detection of abusive language and hate speech in Nigeria. In this paper, we curated code-switched Twitter data directed at three musketeers of the governorship election on the most populous and economically vibrant state in Nigeria; Lagos state, with the view to detect offensive speech in political discussions. We developed EkoHate -- an abusive language and hate speech dataset for political discussions between the three candidates and their followers using a binary (normal vs offensive) and fine-grained four-label annotation scheme. We analysed our dataset and provided an empirical evaluation of state-of-the-art methods across both supervised and cross-lingual transfer learning settings. In the supervised setting, our evaluation results in both binary and four-label annotation schemes show that we can achieve 95.1 and 70.3 F1 points respectively. Furthermore, we show that our dataset adequately transfers very well to three publicly available offensive datasets (OLID, HateUS2020, and FountaHate), generalizing to political discussions in other regions like the US.

EkoHate: Abusive Language and Hate Speech Detection for Code-switched Political Discussions on Nigerian Twitter

TL;DR

95.1 F1) and substantial but more challenging multi-class results (

70.3 F1). They explore language-specific effects of code-switching, report cross-corpus transfer results with OLID, HateUS2020, and FountaHate, and perform error analysis to highlight the difficulty of detecting hate content in a multilingual, code-switched setting. The findings indicate that EkoHate generalizes to other political contexts and languages, and the dataset, along with its code, provides a useful benchmark for future research on hate/offensive speech in African multilingual social media contexts.

Abstract

Paper Structure (26 sections, 2 figures, 12 tables)

This paper contains 26 sections, 2 figures, 12 tables.

Introduction
EkoHate dataset
Lagos Gubernatorial Elections
Labelling Scheme
Anotators
Data collection and Annotation
EkoHate data statistics
Experiment Setup
Dataset
Models and Training
Results
EkoHate baseline
Effect of code-switching
Cross-corpus Transfer setting
Error Analysis
...and 11 more sections

Figures (2)

Figure 1: EkoHate: The distribution of the classes per candidate.
Figure 2: The label distribution according to languages.

EkoHate: Abusive Language and Hate Speech Detection for Code-switched Political Discussions on Nigerian Twitter

TL;DR

Abstract

EkoHate: Abusive Language and Hate Speech Detection for Code-switched Political Discussions on Nigerian Twitter

Authors

TL;DR

Abstract

Table of Contents

Figures (2)