HateGPT: Unleashing GPT-3.5 Turbo to Combat Hate Speech on X

Aniket Deroy; Subhankar Maity

HateGPT: Unleashing GPT-3.5 Turbo to Combat Hate Speech on X

Aniket Deroy, Subhankar Maity

TL;DR

The paper evaluates GPT-3.5 Turbo as a zero-shot prompt-based classifier for English hate speech detection on X, framing the task as distinguishing Hate and Offensive from Non Hate-Offensive. Using a simple prompt and three temperature settings, the authors report Macro-F1 scores around 0.75–0.76 across three runs, indicating robust performance with low variance. The study highlights the viability of prompt engineering for hate-speech detection in English while noting the broader challenge of multilingual and code-mixed content, suggesting future improvements in multilingual handling and subjectivity considerations. Overall, the work demonstrates that large language models can perform competitive hate-speech classification without task-specific fine-tuning, with practical implications for real-time content moderation.

Abstract

The widespread use of social media platforms like Twitter and Facebook has enabled people of all ages to share their thoughts and experiences, leading to an immense accumulation of user-generated content. However, alongside the benefits, these platforms also face the challenge of managing hate speech and offensive content, which can undermine rational discourse and threaten democratic values. As a result, there is a growing need for automated methods to detect and mitigate such content, especially given the complexity of conversations that may require contextual analysis across multiple languages, including code-mixed languages like Hinglish, German-English, and Bangla. We participated in the English task where we have to classify English tweets into two categories namely Hate and Offensive and Non Hate-Offensive. In this work, we experiment with state-of-the-art large language models like GPT-3.5 Turbo via prompting to classify tweets into Hate and Offensive or Non Hate-Offensive. In this study, we evaluate the performance of a classification model using Macro-F1 scores across three distinct runs. The Macro-F1 score, which balances precision and recall across all classes, is used as the primary metric for model evaluation. The scores obtained are 0.756 for run 1, 0.751 for run 2, and 0.754 for run 3, indicating a high level of performance with minimal variance among the runs. The results suggest that the model consistently performs well in terms of precision and recall, with run 1 showing the highest performance. These findings highlight the robustness and reliability of the model across different runs.

HateGPT: Unleashing GPT-3.5 Turbo to Combat Hate Speech on X

TL;DR

Abstract

HateGPT: Unleashing GPT-3.5 Turbo to Combat Hate Speech on X

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)