Table of Contents
Fetching ...

Using LLMs to discover emerging coded antisemitic hate-speech in extremist social media

Dhanush Kikkisetti, Raza Ul Mustafa, Wendy Melillo, Roberto Corizzo, Zois Boukouvalas, Jeff Gill, Nathalie Japkowicz

TL;DR

The paper tackles the problem of rapidly evolving coded antisemitic hate speech on extremist social media and the inadequacy of fixed glossaries to capture new terms. It introduces a two-phase pipeline that first surfaces emergent coded terms using standard concordance/collocation and TF-IDF-based methods, then assesses their antisemitic potential with fine-tuned BERT embeddings and cosine similarity to seed terms. Four pipeline variants (combinations of standard/advanced for term extraction and embedding) are evaluated against a gold standard, with the tfidf-posttrunc configuration achieving the best quantitative performance (accuracy ~80% and F1 ~0.72). The work demonstrates a practical, seed-term–driven approach to assist human moderators by flagging emerging coded terminology, with implications for extending to other hate domains and enabling lifelong learning in monitoring systems.

Abstract

Online hate speech proliferation has created a difficult problem for social media platforms. A particular challenge relates to the use of coded language by groups interested in both creating a sense of belonging for its users and evading detection. Coded language evolves quickly and its use varies over time. This paper proposes a methodology for detecting emerging coded hate-laden terminology. The methodology is tested in the context of online antisemitic discourse. The approach considers posts scraped from social media platforms, often used by extremist users. The posts are scraped using seed expressions related to previously known discourse of hatred towards Jews. The method begins by identifying the expressions most representative of each post and calculating their frequency in the whole corpus. It filters out grammatically incoherent expressions as well as previously encountered ones so as to focus on emergent well-formed terminology. This is followed by an assessment of semantic similarity to known antisemitic terminology using a fine-tuned large language model, and subsequent filtering out of the expressions that are too distant from known expressions of hatred. Emergent antisemitic expressions containing terms clearly relating to Jewish topics are then removed to return only coded expressions of hatred.

Using LLMs to discover emerging coded antisemitic hate-speech in extremist social media

TL;DR

The paper tackles the problem of rapidly evolving coded antisemitic hate speech on extremist social media and the inadequacy of fixed glossaries to capture new terms. It introduces a two-phase pipeline that first surfaces emergent coded terms using standard concordance/collocation and TF-IDF-based methods, then assesses their antisemitic potential with fine-tuned BERT embeddings and cosine similarity to seed terms. Four pipeline variants (combinations of standard/advanced for term extraction and embedding) are evaluated against a gold standard, with the tfidf-posttrunc configuration achieving the best quantitative performance (accuracy ~80% and F1 ~0.72). The work demonstrates a practical, seed-term–driven approach to assist human moderators by flagging emerging coded terminology, with implications for extending to other hate domains and enabling lifelong learning in monitoring systems.

Abstract

Online hate speech proliferation has created a difficult problem for social media platforms. A particular challenge relates to the use of coded language by groups interested in both creating a sense of belonging for its users and evading detection. Coded language evolves quickly and its use varies over time. This paper proposes a methodology for detecting emerging coded hate-laden terminology. The methodology is tested in the context of online antisemitic discourse. The approach considers posts scraped from social media platforms, often used by extremist users. The posts are scraped using seed expressions related to previously known discourse of hatred towards Jews. The method begins by identifying the expressions most representative of each post and calculating their frequency in the whole corpus. It filters out grammatically incoherent expressions as well as previously encountered ones so as to focus on emergent well-formed terminology. This is followed by an assessment of semantic similarity to known antisemitic terminology using a fine-tuned large language model, and subsequent filtering out of the expressions that are too distant from known expressions of hatred. Emergent antisemitic expressions containing terms clearly relating to Jewish topics are then removed to return only coded expressions of hatred.
Paper Structure (24 sections, 4 figures, 3 tables, 2 algorithms)