Probing Critical Learning Dynamics of PLMs for Hate Speech Detection
Sarah Masud, Mohammad Aflah Khan, Vikram Goyal, Md Shad Akhtar, Tanmoy Chakraborty
TL;DR
The paper investigates how critical learning dynamics of pretrained language models influence hate speech detection, examining pretraining seeds, intermediate checkpoints, data recency, finetuning layer choices, and classifier head complexity across seven English datasets. It finds that early pretraining checkpoints often yield peak downstream performance, newer pretraining data provides limited gains, and higher layers near the classifier head are typically most informative for finetuning, with notable exceptions for multilingual models like mBERT. The study challenges the assumption that domain-specific PLMs consistently outperform general-purpose models, showing that a general model with a sufficiently complex classification head can match or exceed domain-specific performance, and highlights the need for dynamic, regularly updated benchmarking datasets. Practical recommendations include reporting results over multiple seeds, leveraging early checkpoints to save compute, and prioritizing targeted finetuning of higher layers, while encouraging dynamic evaluation and broader language coverage in hate speech benchmarks.
Abstract
Despite the widespread adoption, there is a lack of research into how various critical aspects of pretrained language models (PLMs) affect their performance in hate speech detection. Through five research questions, our findings and recommendations lay the groundwork for empirically investigating different aspects of PLMs' use in hate speech detection. We deep dive into comparing different pretrained models, evaluating their seed robustness, finetuning settings, and the impact of pretraining data collection time. Our analysis reveals early peaks for downstream tasks during pretraining, the limited benefit of employing a more recent pretraining corpus, and the significance of specific layers during finetuning. We further call into question the use of domain-specific models and highlight the need for dynamic datasets for benchmarking hate speech detection.
