Efficient Hate Speech Detection: A Three-Layer LoRA-Tuned BERTweet Framework
Mahmoud El-Bahnasawi
TL;DR
The paper tackles the real-time hate speech detection problem under resource constraints by introducing a three-layer pipeline that combines rule-based pre-filtering with a LoRA-tuned BERTweet encoder and a pathway for continuous learning. It demonstrates that this setup reaches a macro F1 of $0.85$ on a unified dataset (~$530\text{K}$ samples) while training only $1.87\text{M}$ parameters, achieving roughly $94\%$ of the performance of a large LLM-based moderator like SafePhi but with a base model that is $100\times$ smaller. The key contributions are (i) showing that LoRA on BERTweet recovers substantial performance with far fewer trainable parameters, (ii) unified dataset construction to improve cross-dataset performance, and (iii) a production-oriented three-layer architecture with a plan for continuous adaptation. The approach offers a practical, scalable solution for moderation in latency- and cost-constrained environments, and the authors release models, code, and a demo to support reproducibility and deployment.
Abstract
This paper addresses the critical challenge of developing computationally efficient hate speech detection systems that maintain competitive performance while being practical for real-time deployment. We propose a novel three-layer framework that combines rule-based pre-filtering with a parameter-efficient LoRA-tuned BERTweet model and continuous learning capabilities. Our approach achieves 0.85 macro F1 score - representing 94% of the performance of state-of-the-art large language models like SafePhi (Phi-4 based) while using a base model that is 100x smaller (134M vs 14B parameters). Compared to traditional BERT-based approaches with similar computational requirements, our method demonstrates superior performance through strategic dataset unification and optimized fine-tuning. The system requires only 1.87M trainable parameters (1.37% of full fine-tuning) and trains in approximately 2 hours on a single T4 GPU, making robust hate speech detection accessible in resource-constrained environments while maintaining competitive accuracy for real-world deployment.
