Table of Contents
Fetching ...

Divided by discipline? A systematic literature review on the quantification of online sexism and misogyny using a semi-automated approach

Aditi Dutta, Susan Banducci, Chico Q. Camargo

TL;DR

This systematic literature review addresses how online sexism and misogyny are quantified across social science and computer science, revealing a persistent disciplinary divide in definitions and methods. The authors introduce a semi-automated PRISMA-based pipeline that combines BERTopic topic modeling, KeyBERT keyword networks, and manual screening to map 2012–2022 literature, identifying five core themes and critical gaps, especially around intersectionality and non-Western contexts. The study highlights that most computational work emphasizes binary detection and text-based classification, while social science work emphasizes qualitative, contextual analyses, pointing to the need for integrated, interdisciplinary taxonomies and diverse datasets. By outlining methodological gaps and proposing a replicable semi-automated workflow, the paper contributes a practical framework for future research aimed at more nuanced, equitable, and globally representative detection and mitigation of online sexism and misogyny.

Abstract

Several computational tools have been developed to detect and identify sexism, misogyny, and gender-based hate speech, particularly on online platforms. These tools draw on insights from both social science and computer science. Given the increasing concern over gender-based discrimination in digital spaces, the contested definitions and measurements of sexism, and the rise of interdisciplinary efforts to understand its online manifestations, a systematic literature review is essential for capturing the current state and trajectory of this evolving field. In this review, we make four key contributions: (1) we synthesize the literature into five core themes: definitions of sexism and misogyny, disciplinary divergences, automated detection methods, associated challenges, and design-based interventions; (2) we adopt an interdisciplinary lens, bridging theoretical and methodological divides across disciplines; (3) we highlight critical gaps, including the need for intersectional approaches, the under-representation of non-Western languages and perspectives, and the limited focus on proactive design strategies beyond text classification; and (4) we offer a methodological contribution by applying a rigorous semi-automated systematic review process guided by PRISMA, establishing a replicable standard for future work in this domain. Our findings reveal a clear disciplinary divide in how sexism and misogyny are conceptualized and measured. Through an evidence-based synthesis, we examine how existing studies have attempted to bridge this gap through interdisciplinary collaboration. Drawing on both social science theories and computational modeling practices, we assess the strengths and limitations of current methodologies. Finally, we outline key challenges and future directions for advancing research on the detection and mitigation of online sexism and misogyny.

Divided by discipline? A systematic literature review on the quantification of online sexism and misogyny using a semi-automated approach

TL;DR

This systematic literature review addresses how online sexism and misogyny are quantified across social science and computer science, revealing a persistent disciplinary divide in definitions and methods. The authors introduce a semi-automated PRISMA-based pipeline that combines BERTopic topic modeling, KeyBERT keyword networks, and manual screening to map 2012–2022 literature, identifying five core themes and critical gaps, especially around intersectionality and non-Western contexts. The study highlights that most computational work emphasizes binary detection and text-based classification, while social science work emphasizes qualitative, contextual analyses, pointing to the need for integrated, interdisciplinary taxonomies and diverse datasets. By outlining methodological gaps and proposing a replicable semi-automated workflow, the paper contributes a practical framework for future research aimed at more nuanced, equitable, and globally representative detection and mitigation of online sexism and misogyny.

Abstract

Several computational tools have been developed to detect and identify sexism, misogyny, and gender-based hate speech, particularly on online platforms. These tools draw on insights from both social science and computer science. Given the increasing concern over gender-based discrimination in digital spaces, the contested definitions and measurements of sexism, and the rise of interdisciplinary efforts to understand its online manifestations, a systematic literature review is essential for capturing the current state and trajectory of this evolving field. In this review, we make four key contributions: (1) we synthesize the literature into five core themes: definitions of sexism and misogyny, disciplinary divergences, automated detection methods, associated challenges, and design-based interventions; (2) we adopt an interdisciplinary lens, bridging theoretical and methodological divides across disciplines; (3) we highlight critical gaps, including the need for intersectional approaches, the under-representation of non-Western languages and perspectives, and the limited focus on proactive design strategies beyond text classification; and (4) we offer a methodological contribution by applying a rigorous semi-automated systematic review process guided by PRISMA, establishing a replicable standard for future work in this domain. Our findings reveal a clear disciplinary divide in how sexism and misogyny are conceptualized and measured. Through an evidence-based synthesis, we examine how existing studies have attempted to bridge this gap through interdisciplinary collaboration. Drawing on both social science theories and computational modeling practices, we assess the strengths and limitations of current methodologies. Finally, we outline key challenges and future directions for advancing research on the detection and mitigation of online sexism and misogyny.
Paper Structure (46 sections, 1 equation, 21 figures, 7 tables)

This paper contains 46 sections, 1 equation, 21 figures, 7 tables.

Figures (21)

  • Figure 4.1: PRISMA flowchart diagram prisma for this research. Each step shows the number of studies included and eliminated at that point of the research.
  • Figure 4.2: Number of publications per year. The blue bars reflect the research articles on Computer Science, while the yellow bars reflect the research articles on Social Science, between the years of 2012-2022.
  • Figure 4.3: This figure show a UMAP scatterplot, where each point represent one document. The unique colors in the figure represent a different topic in computer science centering around sexism and misogyny between 2012 and 2022. Through topic modeling, usually each document get assigned a set of key words as themes within the paper, which are then grouped together with an unique color, representing the same topic with similar sets of keywords found across all the documents. When grouped, each topic is described by their topic name in the same color. The grey points represent outliers (documents which did not get any assigned topic). The highlighted topic name indicates more relevance to our research objectives.
  • Figure 4.4: Similar to Figure \ref{['fig:comp_sci_topics']}, this figure show a UMAP scatterplot where each each unique color represent a different topic in social science centering around sexism and misogyny between 2012 and 2022. The highlighted topic indicates more relevance to our research objectives.
  • Figure 4.5: Most frequent keywords gathered from the abstracts and titles of Computer Science studies in the topic of 'Hate Speech Detection using Deep Learning models'
  • ...and 16 more figures