Table of Contents
Fetching ...

Quantifying Indirect Gender Discrimination on Collaborative Platforms

Orsolya Vasarhelyi, Balazs Vedres

TL;DR

The paper investigates indirect gender discrimination on digital collaboration platforms by comparing GitHub and Behance. It introduces a femaleness metric derived from user behavior and uses Random Forests with SHAP to quantify how gender-typical actions influence attention, success, and survival, while controlling for activity and tenure. The findings show that indirect discrimination accounts for the majority (60–90%) of the total female disadvantage across both platforms, with direct discrimination playing a smaller role and sometimes acting differently for men and women. The work highlights the risk of AI and algorithmic management perpetuating covert gender biases and underscores the need for monitoring mechanisms to mitigate such effects in platform ecosystems.

Abstract

Digital collaborative platforms have become crucial venues of career advancement and individual success in many creative fields, from engineering to the arts. Indirect gender discrimination is a key component to gendered disadvantage on platforms. Such platforms carried the promise of opening avenues of advancement to previously discriminated groups, such as women, as platforms lack managerial gatekeepers with conventional prejudice. We analyzed the extent of indirect gender discriminatory on two diverse platforms, GitHub and Behance, focused on software development and fine arts and design. We found that the main cause of women's disadvantage in attention, success, and survival is largely due to indirect discrimination that varies between 60-90\% of total female disadvantage. Men and women are penalized if they follow highly female-like behavior, while categorical gender's impact varies by outcome and field. As platforms employ algorithmic tools and AI systems to manage users' activity, visibility and recommend new projects to collaborate, stereotypes rooted in behavior can have long-lasting consequences.

Quantifying Indirect Gender Discrimination on Collaborative Platforms

TL;DR

The paper investigates indirect gender discrimination on digital collaboration platforms by comparing GitHub and Behance. It introduces a femaleness metric derived from user behavior and uses Random Forests with SHAP to quantify how gender-typical actions influence attention, success, and survival, while controlling for activity and tenure. The findings show that indirect discrimination accounts for the majority (60–90%) of the total female disadvantage across both platforms, with direct discrimination playing a smaller role and sometimes acting differently for men and women. The work highlights the risk of AI and algorithmic management perpetuating covert gender biases and underscores the need for monitoring mechanisms to mitigate such effects in platform ecosystems.

Abstract

Digital collaborative platforms have become crucial venues of career advancement and individual success in many creative fields, from engineering to the arts. Indirect gender discrimination is a key component to gendered disadvantage on platforms. Such platforms carried the promise of opening avenues of advancement to previously discriminated groups, such as women, as platforms lack managerial gatekeepers with conventional prejudice. We analyzed the extent of indirect gender discriminatory on two diverse platforms, GitHub and Behance, focused on software development and fine arts and design. We found that the main cause of women's disadvantage in attention, success, and survival is largely due to indirect discrimination that varies between 60-90\% of total female disadvantage. Men and women are penalized if they follow highly female-like behavior, while categorical gender's impact varies by outcome and field. As platforms employ algorithmic tools and AI systems to manage users' activity, visibility and recommend new projects to collaborate, stereotypes rooted in behavior can have long-lasting consequences.

Paper Structure

This paper contains 5 sections, 9 figures, 4 tables.

Figures (9)

  • Figure 1: A,C: Beeswarm plots of Femaleness. Each dot represents one data point, where the X axis is determined by the SHAP (SHapley Additive ExPlanations) value. The features are ordered by their relative importance on the Y axis. Color displays the original value of a feature - light colors indicate high values of the given feature, dark colors low. B,D: Distribution of Femaleness. Graphs represent the probability density of femaleness for males(green), females(orange) on GitHub (Panel B) and Behance (Panel D). Dashed lines indicate median femaleness by gender groups.
  • Figure 2: Hypotheses regarding combinations of direct and indirect discrimination. Lines shows hypothetical marginal prediction of outcomes by gender category. Y axis is the resulting prediction of an outcome, X axis is femaleness, color indicates gender.
  • Figure 3: Point estimates of outcomes, with 95 percent confidence intervals, for variables related to gender. Attention and success shows coefficients from Linear Models predicting success (the log. number of stars received, log. number of project appreciations), while survival shows odds ratios from logit models predicting survival over a one year period following our data collection.
  • Figure 4: Marginal predictions of outcomes from model 2 from Fig. \ref{['fig:pointest']}, with fixing all other variables at their means. Vertical dashed lines indicate medians of femaleness, and shaded vertical bars show the interquartile range (IQR)
  • Figure 5: Gender Inferring Accuracy Precision, Recall, and F-score of the GitHub (Vedres-Vasarhelyi, 2019) and Behance (Gender API) gender inferring methods against the manually inferred baseline method and a commonly used alternative method (Gender Guesser Python Package). Among GitHub users, our method and the default Python package yielded very similar results, optimized for high male precision. The used method's relative strength is female-recall, and it's weakness is unknown-recall. The commercial Gender API used to infer the gender of Behance users resulted in higher overall precision, recall, and f-score compared to the default python package. It is important to note that this dataset officially did not include unknown-gendered users, although we found 45 (11%) accounts which belong to companies, therefore, their gender could not be inferred.
  • ...and 4 more figures