Humble AI in the real-world: the case of algorithmic hiring
Rahul Nair, Inge Vejsbjerg, Elizabeth Daly, Christos Varytimidis, Bran Knowles
TL;DR
The paper addresses bias and misranking in algorithmic hiring by operationalizing Humble AI through rank sets that quantify uncertainty in candidate rankings. It formalizes a screening framework where perturbed candidate scores $ ilde z_i$ yield a rank-probability matrix $P=[p_{ij}]$, with per-candidate expected rank $E_i=\sum_j p_{ij} j$ and entropy $H_i=-\sum_j p_{ij}\log p_{ij}$, estimated via Monte Carlo, and evaluated on real platform data and synthetic noisy cases. A user-centered prototype UX surfaces these uncertainties to recruiters, enabling exploration through high-uncertainty candidates and adjustable exploration parameters. The work demonstrates that accounting for uncertainty can improve robustness of rankings under noise, provides methods for computing rank sets in black-box settings, and presents initial qualitative feedback from recruiters, while acknowledging significant validation challenges and the need for iterative design and training to deploy in practice.
Abstract
Humble AI (Knowles et al., 2023) argues for cautiousness in AI development and deployments through scepticism (accounting for limitations of statistical learning), curiosity (accounting for unexpected outcomes), and commitment (accounting for multifaceted values beyond performance). We present a real-world case study for humble AI in the domain of algorithmic hiring. Specifically, we evaluate virtual screening algorithms in a widely used hiring platform that matches candidates to job openings. There are several challenges in misrecognition and stereotyping in such contexts that are difficult to assess through standard fairness and trust frameworks; e.g., someone with a non-traditional background is less likely to rank highly. We demonstrate technical feasibility of how humble AI principles can be translated to practice through uncertainty quantification of ranks, entropy estimates, and a user experience that highlights algorithmic unknowns. We describe preliminary discussions with focus groups made up of recruiters. Future user studies seek to evaluate whether the higher cognitive load of a humble AI system fosters a climate of trust in its outcomes.
