Sexism Detection on a Data Diet

Rabiraj Bandyopadhyay; Dennis Assenmacher; Jose M. Alonso Moral; Claudia Wagner

Sexism Detection on a Data Diet

Rabiraj Bandyopadhyay, Dennis Assenmacher, Jose M. Alonso Moral, Claudia Wagner

TL;DR

This work investigates data-efficient sexism detection by applying influence-score-based pruning to training data. It evaluates three scores—Pointwise V-Information ($PVI$), Error L2-Norm ($EL2N$), and Variance of Gradients ($VoG$)—on a BERT-based classifier trained on a combined in-domain dataset (EDOS + Call Me Sexist But) and tested on out-of-domain data (Hatecheck, EXIST, Misogyny). The results show that up to 50% of data can be removed with little loss in performance, but pruning strategies risk exacerbating class imbalance and do not universally improve cross-domain performance; in particular, Hatecheck remains challenging due to identity-term biases. The findings suggest careful, balance-aware data sampling using influence scores and motivate future work on dynamic pruning strategies that adapt to dataset noise and domain shift, aiming to reduce labeling costs without sacrificing detection quality.

Abstract

There is an increase in the proliferation of online hate commensurate with the rise in the usage of social media. In response, there is also a significant advancement in the creation of automated tools aimed at identifying harmful text content using approaches grounded in Natural Language Processing and Deep Learning. Although it is known that training Deep Learning models require a substantial amount of annotated data, recent line of work suggests that models trained on specific subsets of the data still retain performance comparable to the model that was trained on the full dataset. In this work, we show how we can leverage influence scores to estimate the importance of a data point while training a model and designing a pruning strategy applied to the case of sexism detection. We evaluate the model performance trained on data pruned with different pruning strategies on three out-of-domain datasets and find, that in accordance with other work a large fraction of instances can be removed without significant performance drop. However, we also discover that the strategies for pruning data, previously successful in Natural Language Inference tasks, do not readily apply to the detection of harmful content and instead amplify the already prevalent class imbalance even more, leading in the worst-case to a complete absence of the hateful class.

Sexism Detection on a Data Diet

TL;DR

This work investigates data-efficient sexism detection by applying influence-score-based pruning to training data. It evaluates three scores—Pointwise V-Information (

), Error L2-Norm (

), and Variance of Gradients (

)—on a BERT-based classifier trained on a combined in-domain dataset (EDOS + Call Me Sexist But) and tested on out-of-domain data (Hatecheck, EXIST, Misogyny). The results show that up to 50% of data can be removed with little loss in performance, but pruning strategies risk exacerbating class imbalance and do not universally improve cross-domain performance; in particular, Hatecheck remains challenging due to identity-term biases. The findings suggest careful, balance-aware data sampling using influence scores and motivate future work on dynamic pruning strategies that adapt to dataset noise and domain shift, aiming to reduce labeling costs without sacrificing detection quality.

Abstract

Paper Structure (17 sections, 6 equations, 6 figures, 10 tables)

This paper contains 17 sections, 6 equations, 6 figures, 10 tables.

Introduction
Related Work
Influence Scores
Pointwise V-Information
Error L2-Norm
Variance of Gradients
Methodology
Datasets
Model and Settings
Results
General Findings
Particular Findings from Influence Scores
Analysis of examples after fine-tuning
Conclusions and future work
Limitations
...and 2 more sections

Figures (6)

Figure 1: Macro F1-score on in-domain test data.
Figure 2: Macro F1-scores on EXIST dataset rodriguezsanchezetal.
Figure 3: Macro F1-scores on Misogyny dataset guest-etal-2021-expert.
Figure 4: Macro F1-scores on Hatecheck instances containing the identity-term "woman" rottger-etal-2021-hatecheck.
Figure 5: Sexist to non-sexist data ratio for each of the pruning rates and influence scores.
...and 1 more figures

Sexism Detection on a Data Diet

TL;DR

Abstract

Sexism Detection on a Data Diet

Authors

TL;DR

Abstract

Table of Contents

Figures (6)