Towards Weakly-Supervised Hate Speech Classification Across Datasets
Yiping Jin, Leo Wanner, Vishakha Laxman Kadam, Alexander Shvets
TL;DR
The paper tackles the persistent problem of cross-dataset generalization in hate speech classification caused by divergent taxonomies and annotation practices. It introduces extremely weak supervision, where category names drive representations and labeling, implemented via the X-Class framework to leverage unlabeled data and expand category representations. Across Waseem and SBIC datasets, the approach achieves competitive in-domain performance and demonstrates notable cross-dataset transfer when taxonomy definitions are aligned, including scenarios with unlabeled target-domain data. The results highlight the potential of weak supervision to enable cross-domain HS analysis and benchmarking without extensive manual labeling, while also emphasizing limitations tied to keyword quality and category-definition alignment.
Abstract
As pointed out by several scholars, current research on hate speech (HS) recognition is characterized by unsystematic data creation strategies and diverging annotation schemata. Subsequently, supervised-learning models tend to generalize poorly to datasets they were not trained on, and the performance of the models trained on datasets labeled using different HS taxonomies cannot be compared. To ease this problem, we propose applying extremely weak supervision that only relies on the class name rather than on class samples from the annotated data. We demonstrate the effectiveness of a state-of-the-art weakly-supervised text classification model in various in-dataset and cross-dataset settings. Furthermore, we conduct an in-depth quantitative and qualitative analysis of the source of poor generalizability of HS classification models.
