Table of Contents
Fetching ...

An External Fairness Evaluation of LinkedIn Talent Search

Tina Behzad, Siddartha Devic, Vatsal Sharan, Aleksandra Korolova, David Kempe

TL;DR

This work performs an independent external audit of LinkedIn Talent Search to detect gender and race bias in ranking. It employs five days of identical queries, a privacy-conscious demographic inference pipeline, and exposure-disparity metrics such as Deviation from Group Proportion and MinSkew to assess representation across ranks and over time. The study finds under-representation of women and, to a lesser extent, racial groups in the top ranks and reveals temporal churn disparities, suggesting that post-processing may reduce but not eliminate early-rank bias. The results underscore the feasibility and challenges of external audits on black-box platforms and argue for greater transparency, data access, and governance tools to enable rigorous, reproducible fairness assessments.

Abstract

We conduct an independent, third-party audit for bias of LinkedIn's Talent Search ranking system, focusing on potential ranking bias across two attributes: gender and race. To do so, we first construct a dataset of rankings produced by the system, collecting extensive Talent Search results across a diverse set of occupational queries. We then develop a robust labeling pipeline that infers the two demographic attributes of interest for the returned users. To evaluate potential biases in the collected dataset of real-world rankings, we utilize two exposure disparity metrics: deviation from group proportions and MinSkew. Our analysis reveals an under-representation of minority groups in early ranks across many queries. We further examine potential causes of this disparity, and discuss why they may be difficult or, in some cases, impossible to fully eliminate among the early ranks of queries. Beyond static metrics, we also investigate the concept of subgroup fairness over time, highlighting temporal disparities in exposure and retention, which are often more difficult to audit for in practice. In employer recruiting platforms such as LinkedIn Talent Search, the persistence of a particular candidate over multiple days in the ranking can directly impact the probability that the given candidate is selected for opportunities. Our analysis reveals demographic disparities in this temporal stability, with some groups experiencing greater volatility in their ranked positions than others. We contextualize all our findings alongside LinkedIn's published self-audits of its Talent Search system and reflect on the methodological constraints of a black-box external evaluation, including limited observability and noisy demographic inference.

An External Fairness Evaluation of LinkedIn Talent Search

TL;DR

This work performs an independent external audit of LinkedIn Talent Search to detect gender and race bias in ranking. It employs five days of identical queries, a privacy-conscious demographic inference pipeline, and exposure-disparity metrics such as Deviation from Group Proportion and MinSkew to assess representation across ranks and over time. The study finds under-representation of women and, to a lesser extent, racial groups in the top ranks and reveals temporal churn disparities, suggesting that post-processing may reduce but not eliminate early-rank bias. The results underscore the feasibility and challenges of external audits on black-box platforms and argue for greater transparency, data access, and governance tools to enable rigorous, reproducible fairness assessments.

Abstract

We conduct an independent, third-party audit for bias of LinkedIn's Talent Search ranking system, focusing on potential ranking bias across two attributes: gender and race. To do so, we first construct a dataset of rankings produced by the system, collecting extensive Talent Search results across a diverse set of occupational queries. We then develop a robust labeling pipeline that infers the two demographic attributes of interest for the returned users. To evaluate potential biases in the collected dataset of real-world rankings, we utilize two exposure disparity metrics: deviation from group proportions and MinSkew. Our analysis reveals an under-representation of minority groups in early ranks across many queries. We further examine potential causes of this disparity, and discuss why they may be difficult or, in some cases, impossible to fully eliminate among the early ranks of queries. Beyond static metrics, we also investigate the concept of subgroup fairness over time, highlighting temporal disparities in exposure and retention, which are often more difficult to audit for in practice. In employer recruiting platforms such as LinkedIn Talent Search, the persistence of a particular candidate over multiple days in the ranking can directly impact the probability that the given candidate is selected for opportunities. Our analysis reveals demographic disparities in this temporal stability, with some groups experiencing greater volatility in their ranked positions than others. We contextualize all our findings alongside LinkedIn's published self-audits of its Talent Search system and reflect on the methodological constraints of a black-box external evaluation, including limited observability and noisy demographic inference.

Paper Structure

This paper contains 34 sections, 18 equations, 21 figures, 9 tables.

Figures (21)

  • Figure 1: A schematic overview of our pipeline: we issue identical queries to LinkedIn Talent Search over five consecutive days (Section \ref{['sec:data_retrieval']}), ingest the results into our database, and then enrich these records with demographic inferences using external APIs and datasets (Section \ref{['sec:data_labeling']}). Finally, we carry out exposure‐disparity analysis (Section \ref{['sec:exposure_disparity_analysis']}) and temporal‐disparity analysis (Section \ref{['sec:temporal_analysis']}).
  • Figure 2: Candidate search filters available in LinkedIn Recruiter. Source: linkedin_search_filters.
  • Figure 3: A snapshot of the results for a sample query where the top-ranked candidate is missing. Candidate cards are blurred to preserve privacy.
  • Figure 4: Deviation between the observed top-$k$ gender proportions and the overall candidate pool proportions for the set of queries for which we scraped the full list of returned candidates ($Q_3$). Each row corresponds to a query, with gender-wise deviations shown across rank positions up to $k = 300$. Gray areas indicate ranks beyond the total number of returned candidates for that query (i.e., the candidate pool was smaller than 300). Red values indicate under-representation relative to the overall group proportion, while blue values indicate over-representation.
  • Figure 5: Deviation between the observed top-$k$ gender proportions and the overall candidate pool proportions for the set $Q'_3$ of queries with less than 1% missing members. Gender-wise deviations are shown across rank positions up to $k = 200$. Gray areas indicate ranks beyond the total number of returned candidates for that query.
  • ...and 16 more figures