Deep Learning Approaches for Detecting Adversarial Cyberbullying and Hate Speech in Social Networks

Sylvia Worlali Azumah; Nelly Elsayed; Zag ElSayed; Murat Ozer; Amanda La Guardia

Deep Learning Approaches for Detecting Adversarial Cyberbullying and Hate Speech in Social Networks

Sylvia Worlali Azumah, Nelly Elsayed, Zag ElSayed, Murat Ozer, Amanda La Guardia

TL;DR

This work tackles the detection of cyberbullying and hate speech in adversarial social-media text by introducing a two-phase framework that first applies adversarial-correction preprocessing and then an LSTM-based classifier. The model, trained with a fixed epoch of 100, achieves an accuracy of 87.57% and an AUC-ROC of 91%, outperforming prior methods such as SVM-based approaches and other neural architectures. By leveraging the Davidson hateoffensive dataset and a structured preprocessing pipeline (noise removal, stop-word elimination, tokenization, normalization, and spell-check-based correction), the approach demonstrates robustness to input perturbations designed to evade detection. The inclusion of AUC-ROC as a key metric and the comparative analysis against established baselines underscore the method's practical significance for real-world safety in online platforms.

Abstract

Cyberbullying is a significant concern intricately linked to technology that can find resolution through technological means. Despite its prevalence, technology also provides solutions to mitigate cyberbullying. To address growing concerns regarding the adverse impact of cyberbullying on individuals' online experiences, various online platforms and researchers are actively adopting measures to enhance the safety of digital environments. While researchers persist in crafting detection models to counteract or minimize cyberbullying, malicious actors are deploying adversarial techniques to circumvent these detection methods. This paper focuses on detecting cyberbullying in adversarial attack content within social networking site text data, specifically emphasizing hate speech. Utilizing a deep learning-based approach with a correction algorithm, this paper yielded significant results. An LSTM model with a fixed epoch of 100 demonstrated remarkable performance, achieving high accuracy, precision, recall, F1-score, and AUC-ROC scores of 87.57%, 88.73%, 87.57%, 88.15%, and 91% respectively. Additionally, the LSTM model's performance surpassed that of previous studies.

Deep Learning Approaches for Detecting Adversarial Cyberbullying and Hate Speech in Social Networks

TL;DR

Abstract

Deep Learning Approaches for Detecting Adversarial Cyberbullying and Hate Speech in Social Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)