Evaluating AI Recruitment Sourcing Tools by Human Preference
Vladimir Slaykovskiy, Maksim Zvegintsev, Yury Sakhonchyk, Hrachik Ajamian
TL;DR
The paper addresses how AI-powered recruitment sourcing tools compare to LinkedIn Recruiter in terms of candidate relevance. It employs a benchmark built from anonymized query data, human expert judgments, and an automated LLM-based evaluator (LLM-judge), synthesized through Elo-based rankings and win-rate analyses. Key findings show AI-native tools outperform LinkedIn, with Pearch.ai achieving the highest performance across evaluations. The approach demonstrates the viability and practical value of automated benchmarking for talent acquisition, and provides public data and code to enable reproducibility and ongoing progress in AI-assisted sourcing.
Abstract
This study introduces a benchmarking methodology designed to evaluate the performance of AI-driven recruitment sourcing tools. We created and utilized a dataset to perform a comparative analysis of search results generated by leading AI-based solutions, LinkedIn Recruiter, and our proprietary system, Pearch.ai. Human experts assessed the relevance of the returned candidates, and an Elo rating system was applied to quantitatively measure each tool's comparative performance. Our findings indicate that AI-driven recruitment sourcing tools consistently outperform LinkedIn Recruiter in candidate relevance, with Pearch.ai achieving the highest performance scores. Furthermore, we found a strong alignment between AI-based evaluations and human judgments, highlighting the potential for advanced AI technologies to substantially enhance talent acquisition effectiveness. Code and supporting data are publicly available at https://github.com/vslaykovsky/ai-sourcing-benchmark
