Detecting Sexism in German Online Newspaper Comments with Open-Source Text Embeddings (Team GDA, GermEval2024 Shared Task 1: GerMS-Detect, Subtasks 1 and 2, Closed Track)

Florian Bremm; Patrick Gustav Blaneck; Tobias Bornheim; Niklas Grieger; Stephan Bialonski

Detecting Sexism in German Online Newspaper Comments with Open-Source Text Embeddings (Team GDA, GermEval2024 Shared Task 1: GerMS-Detect, Subtasks 1 and 2, Closed Track)

Florian Bremm, Patrick Gustav Blaneck, Tobias Bornheim, Niklas Grieger, Stephan Bialonski

TL;DR

This work study monolingual and multilingual open-source text embeddings to reliably detect sexism and misogyny in German-language online comments from an Austrian newspaper and observes classifiers trained on text embeddings to mimic closely the individual judgements of human annotators.

Abstract

Sexism in online media comments is a pervasive challenge that often manifests subtly, complicating moderation efforts as interpretations of what constitutes sexism can vary among individuals. We study monolingual and multilingual open-source text embeddings to reliably detect sexism and misogyny in German-language online comments from an Austrian newspaper. We observed classifiers trained on text embeddings to mimic closely the individual judgements of human annotators. Our method showed robust performance in the GermEval 2024 GerMS-Detect Subtask 1 challenge, achieving an average macro F1 score of 0.597 (4th place, as reported on Codabench). It also accurately predicted the distribution of human annotations in GerMS-Detect Subtask 2, with an average Jensen-Shannon distance of 0.301 (2nd place). The computational efficiency of our approach suggests potential for scalable applications across various languages and linguistic contexts.

Detecting Sexism in German Online Newspaper Comments with Open-Source Text Embeddings (Team GDA, GermEval2024 Shared Task 1: GerMS-Detect, Subtasks 1 and 2, Closed Track)

TL;DR

Abstract

Paper Structure (9 sections, 4 figures, 2 tables)

This paper contains 9 sections, 4 figures, 2 tables.

Introduction
Data and Tasks
Data
Tasks
Methods and Results
Model Architecture
Model Training and Evaluation
Results
Conclusion

Figures (4)

Figure 1: Comments from the provided training dataset with annotations grouped by label (annotators shown in parentheses). The comment in Example 1 contains an Austrian dialect and was annotated by all ten experts receiving a variety of labels. Example 2 was only annotated by four experts and received the same label from all of them.
Figure 2: Distribution of the labels assigned by each annotator (A001--A012). Note that there are no annotations from users A006 and A011.
Figure 3: Macro-F1 scores our models achieved when aggregating the predictions for Subtask 1 on the validation set (higher is better).
Figure 4: Jensen-Shannon distances our models achieved when aggregating the predictions for Subtask 2 on the validation set (lower is better).

Detecting Sexism in German Online Newspaper Comments with Open-Source Text Embeddings (Team GDA, GermEval2024 Shared Task 1: GerMS-Detect, Subtasks 1 and 2, Closed Track)

TL;DR

Abstract

Detecting Sexism in German Online Newspaper Comments with Open-Source Text Embeddings (Team GDA, GermEval2024 Shared Task 1: GerMS-Detect, Subtasks 1 and 2, Closed Track)

Authors

TL;DR

Abstract

Table of Contents

Figures (4)