Learning to Retrieve for Job Matching
Jianqiang Shen, Yuchin Juan, Shaobo Zhang, Ping Liu, Wen Pu, Sriram Vasudevan, Qingquan Song, Fedor Borisyuk, Kay Qianqi Shen, Haichao Wei, Yunxiang Ren, Yeou S. Chiou, Sicong Kuang, Yuan Yin, Ben Zheng, Muchen Wu, Shaghayegh Gharghabi, Xiaoqing Wang, Huichao Xue, Qi Guo, Daniel Hewlett, Luke Simon, Liangjie Hong, Wenjing Zhang
TL;DR
The paper tackles scalable and explainable job matching at LinkedIn by formulating a learning-to-retrieve framework for both promoted and organic channels. It introduces a graph-based auto-targeting approach to learn seeker-job links from confirmed hires for targeted candidate delivery in promoted listings, alongside an Embedding-Based Retrieval (EBR) system with curriculum-guided negative sampling to personalize organic retrieval. A GPU-based exhaustive search pipeline integrates both term matching and embedding scores to achieve low latency at web scale, outperforming inverted-index baselines and delivering meaningful business gains (e.g., improved budget utilization and engagement metrics). The work demonstrates practical, explainable, and scalable retrieval improvements, with clear pathways for future enhancements leveraging split architectures, LLMs, and multilingual capabilities.
Abstract
Web-scale search systems typically tackle the scalability challenge with a two-step paradigm: retrieval and ranking. The retrieval step, also known as candidate selection, often involves extracting standardized entities, creating an inverted index, and performing term matching for retrieval. Such traditional methods require manual and time-consuming development of query models. In this paper, we discuss applying learning-to-retrieve technology to enhance LinkedIns job search and recommendation systems. In the realm of promoted jobs, the key objective is to improve the quality of applicants, thereby delivering value to recruiter customers. To achieve this, we leverage confirmed hire data to construct a graph that evaluates a seeker's qualification for a job, and utilize learned links for retrieval. Our learned model is easy to explain, debug, and adjust. On the other hand, the focus for organic jobs is to optimize seeker engagement. We accomplished this by training embeddings for personalized retrieval, fortified by a set of rules derived from the categorization of member feedback. In addition to a solution based on a conventional inverted index, we developed an on-GPU solution capable of supporting both KNN and term matching efficiently.
