Table of Contents
Fetching ...

Detecting Anti-Semitic Hate Speech using Transformer-based Large Language Models

Dengyi Liu, Minghao Wang, Andrew G. Catlin

TL;DR

The paper tackles anti-Semitic hate speech detection on social media by combining a threshold-based data labeling approach with a broad evaluation of transformer-based models, including BERT, DistilBERT, RoBERTa, and LLaMA-2, enhanced via LoRA for efficient fine-tuning. A dataset of roughly 10,000 Twitter posts is collected, with 3,000 posts annotated through a two-annotator process and a third reviewer for disputes, forming the basis for training traditional ML methods and modern transformers. Results show transformer models significantly outperform traditional classifiers, with BERT achieving leading performance and LoRA-enhanced RoBERTa and LLaMA-2 providing strong metrics and substantially reduced training times. The work highlights practical considerations for deploying such systems—resource demands, interpretability, and ethical implications—while suggesting directions to improve efficiency and transparency in sensitive moderation scenarios.

Abstract

Academic researchers and social media entities grappling with the identification of hate speech face significant challenges, primarily due to the vast scale of data and the dynamic nature of hate speech. Given the ethical and practical limitations of large predictive models like ChatGPT in directly addressing such sensitive issues, our research has explored alternative advanced transformer-based and generative AI technologies since 2019. Specifically, we developed a new data labeling technique and established a proof of concept targeting anti-Semitic hate speech, utilizing a variety of transformer models such as BERT (arXiv:1810.04805), DistillBERT (arXiv:1910.01108), RoBERTa (arXiv:1907.11692), and LLaMA-2 (arXiv:2307.09288), complemented by the LoRA fine-tuning approach (arXiv:2106.09685). This paper delineates and evaluates the comparative efficacy of these cutting-edge methods in tackling the intricacies of hate speech detection, highlighting the need for responsible and carefully managed AI applications within sensitive contexts.

Detecting Anti-Semitic Hate Speech using Transformer-based Large Language Models

TL;DR

The paper tackles anti-Semitic hate speech detection on social media by combining a threshold-based data labeling approach with a broad evaluation of transformer-based models, including BERT, DistilBERT, RoBERTa, and LLaMA-2, enhanced via LoRA for efficient fine-tuning. A dataset of roughly 10,000 Twitter posts is collected, with 3,000 posts annotated through a two-annotator process and a third reviewer for disputes, forming the basis for training traditional ML methods and modern transformers. Results show transformer models significantly outperform traditional classifiers, with BERT achieving leading performance and LoRA-enhanced RoBERTa and LLaMA-2 providing strong metrics and substantially reduced training times. The work highlights practical considerations for deploying such systems—resource demands, interpretability, and ethical implications—while suggesting directions to improve efficiency and transparency in sensitive moderation scenarios.

Abstract

Academic researchers and social media entities grappling with the identification of hate speech face significant challenges, primarily due to the vast scale of data and the dynamic nature of hate speech. Given the ethical and practical limitations of large predictive models like ChatGPT in directly addressing such sensitive issues, our research has explored alternative advanced transformer-based and generative AI technologies since 2019. Specifically, we developed a new data labeling technique and established a proof of concept targeting anti-Semitic hate speech, utilizing a variety of transformer models such as BERT (arXiv:1810.04805), DistillBERT (arXiv:1910.01108), RoBERTa (arXiv:1907.11692), and LLaMA-2 (arXiv:2307.09288), complemented by the LoRA fine-tuning approach (arXiv:2106.09685). This paper delineates and evaluates the comparative efficacy of these cutting-edge methods in tackling the intricacies of hate speech detection, highlighting the need for responsible and carefully managed AI applications within sensitive contexts.
Paper Structure (14 sections, 2 figures, 2 tables, 1 algorithm)

This paper contains 14 sections, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: This figure shows the workflow of voting algorithm.
  • Figure 2: Architecture of LoRA fine-tuning applied to Llama2 model weights, demonstrating how the injection of trainable low-rank matrices A and B (initialized as A with a normal distribution and B set to zero) allows for efficient adaptation of the model's weights to new tasks, preserving the original parameters while introducing minimal updates.