Table of Contents
Fetching ...

Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation

Shiza Ali, Jeremy Blackburn, Gianluca Stringhini

TL;DR

This work tackles the evolving nature of hate speech by proposing an adaptive framework that updates hate lexicons via word embeddings and a hybrid detector combining lexicon-based features with BERT-style modeling. The approach identifies candidate new toxic words, tests updated lexicons with traditional ML models, and implements a hybrid Lexicon–BERT system to capture contextual and obfuscated language. Evaluations across multiple datasets and a 76,378-post benchmark demonstrate improved detection performance and resilience to evolving euphemisms and spelling variants, with the hybrid model achieving high accuracy on diverse test sets. The findings have practical implications for safer online spaces, though limitations include language scope and data-access constraints, suggesting futures in multilingual adaptation and real-time lexicon maintenance.

Abstract

The proliferation of social media platforms has led to an increase in the spread of hate speech, particularly targeting vulnerable communities. Unfortunately, existing methods for automatically identifying and blocking toxic language rely on pre-constructed lexicons, making them reactive rather than adaptive. As such, these approaches become less effective over time, especially when new communities are targeted with slurs not included in the original datasets. To address this issue, we present an adaptive approach that uses word embeddings to update lexicons and develop a hybrid model that adjusts to emerging slurs and new linguistic patterns. This approach can effectively detect toxic language, including intentional spelling mistakes employed by aggressors to avoid detection. Our hybrid model, which combines BERT with lexicon-based techniques, achieves an accuracy of 95% for most state-of-the-art datasets. Our work has significant implications for creating safer online environments by improving the detection of toxic content and proactively updating the lexicon. Content Warning: This paper contains examples of hate speech that may be triggering.

Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation

TL;DR

This work tackles the evolving nature of hate speech by proposing an adaptive framework that updates hate lexicons via word embeddings and a hybrid detector combining lexicon-based features with BERT-style modeling. The approach identifies candidate new toxic words, tests updated lexicons with traditional ML models, and implements a hybrid Lexicon–BERT system to capture contextual and obfuscated language. Evaluations across multiple datasets and a 76,378-post benchmark demonstrate improved detection performance and resilience to evolving euphemisms and spelling variants, with the hybrid model achieving high accuracy on diverse test sets. The findings have practical implications for safer online spaces, though limitations include language scope and data-access constraints, suggesting futures in multilingual adaptation and real-time lexicon maintenance.

Abstract

The proliferation of social media platforms has led to an increase in the spread of hate speech, particularly targeting vulnerable communities. Unfortunately, existing methods for automatically identifying and blocking toxic language rely on pre-constructed lexicons, making them reactive rather than adaptive. As such, these approaches become less effective over time, especially when new communities are targeted with slurs not included in the original datasets. To address this issue, we present an adaptive approach that uses word embeddings to update lexicons and develop a hybrid model that adjusts to emerging slurs and new linguistic patterns. This approach can effectively detect toxic language, including intentional spelling mistakes employed by aggressors to avoid detection. Our hybrid model, which combines BERT with lexicon-based techniques, achieves an accuracy of 95% for most state-of-the-art datasets. Our work has significant implications for creating safer online environments by improving the detection of toxic content and proactively updating the lexicon. Content Warning: This paper contains examples of hate speech that may be triggering.

Paper Structure

This paper contains 22 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Architecture of our adaptive hate speech detection system.