Quantifying gendered citation imbalance in computer science conferences
Kazuki Nakajima, Yuya Sasaki, Sohei Tokuno, George Fletcher
TL;DR
This study investigates gendered citation imbalance in computer science conferences by constructing a dedicated OpenAlex–DBLP dataset with author gender and conference-rank metadata and by developing a family of reference models that preserve key network properties. The three models—Random-draws, Homophilic-draws, and Preferential-draws—evaluate how the number of citations, homophily in citations, and heterogeneity in citations received shape gendered citation patterns and ranking outcomes. The authors find that citation homophily strongly drives gender imbalances, that heterogeneity in citation counts plays a smaller role, and that the imbalance is most pronounced in top-ranked conferences and persists across subfields and in citation-based rankings like PageRank. The framework links network structure to gendered citation dynamics and suggests fairness-oriented interventions by adjusting network processes or rankings, with implications for research evaluation and conference prestige effects.
Abstract
The number of citations received by papers often exhibits imbalances in terms of author attributes such as country of affiliation and gender. While recent studies have quantified citation imbalance in terms of the authors' gender in journal papers, the computer science discipline, where researchers frequently present their work at conferences, may exhibit unique patterns in gendered citation imbalance. Additionally, understanding how network properties in citations influence citation imbalances remains challenging due to a lack of suitable reference models. In this paper, we develop a family of reference models for citation networks and investigate gender imbalance in citations between papers published in computer science conferences. By deploying these reference models, we found that homophily in citations is strongly associated with gendered citation imbalance in computer science, whereas heterogeneity in the number of citations received per paper has a relatively minor association with it. Furthermore, we found that the gendered citation imbalance is most pronounced in papers published in the highest-ranked conferences, is present across different subfields, and extends to citation-based rankings of papers. Our study provides a framework for investigating associations between network properties and citation imbalances, aiming to enhance our understanding of the structure and dynamics of citations between research publications.
