Table of Contents
Fetching ...

BiaSWE: An Expert Annotated Dataset for Misogyny Detection in Swedish

Kätriin Kukk, Danila Petrelli, Judit Casademont, Eric J. W. Orlowski, Michał Dzieliński, Maria Jacobson

TL;DR

This work addresses the challenge of detecting misogyny in Swedish, a low-resource language with strong cultural context, by creating BiaSWE, a small expert-annotated dataset and a transparent annotation process. The authors combine data from the Flashback forum with keyword-driven sampling and a four-task annotation workflow guided by a Swedish misogyny taxonomy, using Label Studio and interdisciplinary expert input. Key contributions include the BiaSWE dataset, the detailed guidelines for its annotation, and a methodological framework for interdisciplinary bias detection in Swedish. The study demonstrates substantial inter-annotator agreement and provides a replicable path toward culturally informed misogyny detection in under-resourced languages, while acknowledging limitations and outlining concrete directions for expansion and refinement.

Abstract

In this study, we introduce the process for creating BiaSWE, an expert-annotated dataset tailored for misogyny detection in the Swedish language. To address the cultural and linguistic specificity of misogyny in Swedish, we collaborated with experts from the social sciences and humanities. Our interdisciplinary team developed a rigorous annotation process, incorporating both domain knowledge and language expertise, to capture the nuances of misogyny in a Swedish context. This methodology ensures that the dataset is not only culturally relevant but also aligned with broader efforts in bias detection for low-resource languages. The dataset, along with the annotation guidelines, is publicly available for further research.

BiaSWE: An Expert Annotated Dataset for Misogyny Detection in Swedish

TL;DR

This work addresses the challenge of detecting misogyny in Swedish, a low-resource language with strong cultural context, by creating BiaSWE, a small expert-annotated dataset and a transparent annotation process. The authors combine data from the Flashback forum with keyword-driven sampling and a four-task annotation workflow guided by a Swedish misogyny taxonomy, using Label Studio and interdisciplinary expert input. Key contributions include the BiaSWE dataset, the detailed guidelines for its annotation, and a methodological framework for interdisciplinary bias detection in Swedish. The study demonstrates substantial inter-annotator agreement and provides a replicable path toward culturally informed misogyny detection in under-resourced languages, while acknowledging limitations and outlining concrete directions for expansion and refinement.

Abstract

In this study, we introduce the process for creating BiaSWE, an expert-annotated dataset tailored for misogyny detection in the Swedish language. To address the cultural and linguistic specificity of misogyny in Swedish, we collaborated with experts from the social sciences and humanities. Our interdisciplinary team developed a rigorous annotation process, incorporating both domain knowledge and language expertise, to capture the nuances of misogyny in a Swedish context. This methodology ensures that the dataset is not only culturally relevant but also aligned with broader efforts in bias detection for low-resource languages. The dataset, along with the annotation guidelines, is publicly available for further research.

Paper Structure

This paper contains 8 sections, 2 figures.

Figures (2)

  • Figure 1: Is this post misogynistic?
  • Figure 2: Which category of misogyny does this example belong to?