An External Fairness Evaluation of LinkedIn Talent Search
Tina Behzad, Siddartha Devic, Vatsal Sharan, Aleksandra Korolova, David Kempe
TL;DR
This work performs an independent external audit of LinkedIn Talent Search to detect gender and race bias in ranking. It employs five days of identical queries, a privacy-conscious demographic inference pipeline, and exposure-disparity metrics such as Deviation from Group Proportion and MinSkew to assess representation across ranks and over time. The study finds under-representation of women and, to a lesser extent, racial groups in the top ranks and reveals temporal churn disparities, suggesting that post-processing may reduce but not eliminate early-rank bias. The results underscore the feasibility and challenges of external audits on black-box platforms and argue for greater transparency, data access, and governance tools to enable rigorous, reproducible fairness assessments.
Abstract
We conduct an independent, third-party audit for bias of LinkedIn's Talent Search ranking system, focusing on potential ranking bias across two attributes: gender and race. To do so, we first construct a dataset of rankings produced by the system, collecting extensive Talent Search results across a diverse set of occupational queries. We then develop a robust labeling pipeline that infers the two demographic attributes of interest for the returned users. To evaluate potential biases in the collected dataset of real-world rankings, we utilize two exposure disparity metrics: deviation from group proportions and MinSkew. Our analysis reveals an under-representation of minority groups in early ranks across many queries. We further examine potential causes of this disparity, and discuss why they may be difficult or, in some cases, impossible to fully eliminate among the early ranks of queries. Beyond static metrics, we also investigate the concept of subgroup fairness over time, highlighting temporal disparities in exposure and retention, which are often more difficult to audit for in practice. In employer recruiting platforms such as LinkedIn Talent Search, the persistence of a particular candidate over multiple days in the ranking can directly impact the probability that the given candidate is selected for opportunities. Our analysis reveals demographic disparities in this temporal stability, with some groups experiencing greater volatility in their ranked positions than others. We contextualize all our findings alongside LinkedIn's published self-audits of its Talent Search system and reflect on the methodological constraints of a black-box external evaluation, including limited observability and noisy demographic inference.
