Table of Contents
Fetching ...

Revisiting gender bias research in bibliometrics: Standardizing methodological variability using Scholarly Data Analysis (SoDA) Cards

HaeJin Lee, Shubhanshu Mishra, Apratim Mishra, Zhiwen You, Jinseok Kim, Jana Diesner

TL;DR

This paper investigates how methodological variability in author name disambiguation (AND) and gender identification affects conclusions about gender bias in scholarly metrics. It surveys 70 papers from 2009–2023 to map disambiguation and gender-identification practices and reveals substantial heterogeneity, including frequent use of no disambiguation and binary gender labeling, with notable challenges for Asian names. To address this, the authors introduce Scholarly Data Analysis (SoDA) Cards, a standardized reporting framework modeled after Model Cards and Datasheets, to document study specifications, corpus profiles, methods, analyses, and results, thereby improving transparency and comparability. The work combines a systematic literature review, annotation with reliability checks (Cohen's kappa values: $\kappa=0.81$, $0.85$, $0.71$ across facets), and a principled workflow to promote reproducibility and evidence-informed policymaking in gender and other demographic bias research. Overall, SoDA Cards aim to elevate methodological rigor and enable longitudinal tracking of analytical practices in bibliometrics, fostering more robust, comparable conclusions across studies.

Abstract

Gender biases in scholarly metrics remain a persistent concern, despite numerous bibliometric studies exploring their presence and absence across productivity, impact, acknowledgment, and self-citations. However, methodological inconsistencies, particularly in author name disambiguation and gender identification, limit the reliability and comparability of these studies, potentially perpetuating misperceptions and hindering effective interventions. A review of 70 relevant publications over the past 12 years reveals a wide range of approaches, from name-based and manual searches to more algorithmic and gold-standard methods, with no clear consensus on best practices. This variability, compounded by challenges such as accurately disambiguating Asian names and managing unassigned gender labels, underscores the urgent need for standardized and robust methodologies. To address this critical gap, we propose the development and implementation of ``Scholarly Data Analysis (SoDA) Cards." These cards will provide a structured framework for documenting and reporting key methodological choices in scholarly data analysis, including author name disambiguation and gender identification procedures. By promoting transparency and reproducibility, SoDA Cards will facilitate more accurate comparisons and aggregations of research findings, ultimately supporting evidence-informed policymaking and enabling the longitudinal tracking of analytical approaches in the study of gender and other social biases in academia.

Revisiting gender bias research in bibliometrics: Standardizing methodological variability using Scholarly Data Analysis (SoDA) Cards

TL;DR

This paper investigates how methodological variability in author name disambiguation (AND) and gender identification affects conclusions about gender bias in scholarly metrics. It surveys 70 papers from 2009–2023 to map disambiguation and gender-identification practices and reveals substantial heterogeneity, including frequent use of no disambiguation and binary gender labeling, with notable challenges for Asian names. To address this, the authors introduce Scholarly Data Analysis (SoDA) Cards, a standardized reporting framework modeled after Model Cards and Datasheets, to document study specifications, corpus profiles, methods, analyses, and results, thereby improving transparency and comparability. The work combines a systematic literature review, annotation with reliability checks (Cohen's kappa values: , , across facets), and a principled workflow to promote reproducibility and evidence-informed policymaking in gender and other demographic bias research. Overall, SoDA Cards aim to elevate methodological rigor and enable longitudinal tracking of analytical practices in bibliometrics, fostering more robust, comparable conclusions across studies.

Abstract

Gender biases in scholarly metrics remain a persistent concern, despite numerous bibliometric studies exploring their presence and absence across productivity, impact, acknowledgment, and self-citations. However, methodological inconsistencies, particularly in author name disambiguation and gender identification, limit the reliability and comparability of these studies, potentially perpetuating misperceptions and hindering effective interventions. A review of 70 relevant publications over the past 12 years reveals a wide range of approaches, from name-based and manual searches to more algorithmic and gold-standard methods, with no clear consensus on best practices. This variability, compounded by challenges such as accurately disambiguating Asian names and managing unassigned gender labels, underscores the urgent need for standardized and robust methodologies. To address this critical gap, we propose the development and implementation of ``Scholarly Data Analysis (SoDA) Cards." These cards will provide a structured framework for documenting and reporting key methodological choices in scholarly data analysis, including author name disambiguation and gender identification procedures. By promoting transparency and reproducibility, SoDA Cards will facilitate more accurate comparisons and aggregations of research findings, ultimately supporting evidence-informed policymaking and enabling the longitudinal tracking of analytical approaches in the study of gender and other social biases in academia.

Paper Structure

This paper contains 49 sections, 1 equation, 7 figures, 3 tables.

Figures (7)

  • Figure 1: A principled approach for conducting demographic bias analysis for scholarly data research
  • Figure 2: Difference between author and authorship
  • Figure 3: Distribution of selected papers
  • Figure 4: Author Name Disambiguation Methods Distribution Barchart
  • Figure 5: Distribution of Gender Identification Methods
  • ...and 2 more figures