Table of Contents
Fetching ...

Learning to Retrieve for Job Matching

Jianqiang Shen, Yuchin Juan, Shaobo Zhang, Ping Liu, Wen Pu, Sriram Vasudevan, Qingquan Song, Fedor Borisyuk, Kay Qianqi Shen, Haichao Wei, Yunxiang Ren, Yeou S. Chiou, Sicong Kuang, Yuan Yin, Ben Zheng, Muchen Wu, Shaghayegh Gharghabi, Xiaoqing Wang, Huichao Xue, Qi Guo, Daniel Hewlett, Luke Simon, Liangjie Hong, Wenjing Zhang

TL;DR

The paper tackles scalable and explainable job matching at LinkedIn by formulating a learning-to-retrieve framework for both promoted and organic channels. It introduces a graph-based auto-targeting approach to learn seeker-job links from confirmed hires for targeted candidate delivery in promoted listings, alongside an Embedding-Based Retrieval (EBR) system with curriculum-guided negative sampling to personalize organic retrieval. A GPU-based exhaustive search pipeline integrates both term matching and embedding scores to achieve low latency at web scale, outperforming inverted-index baselines and delivering meaningful business gains (e.g., improved budget utilization and engagement metrics). The work demonstrates practical, explainable, and scalable retrieval improvements, with clear pathways for future enhancements leveraging split architectures, LLMs, and multilingual capabilities.

Abstract

Web-scale search systems typically tackle the scalability challenge with a two-step paradigm: retrieval and ranking. The retrieval step, also known as candidate selection, often involves extracting standardized entities, creating an inverted index, and performing term matching for retrieval. Such traditional methods require manual and time-consuming development of query models. In this paper, we discuss applying learning-to-retrieve technology to enhance LinkedIns job search and recommendation systems. In the realm of promoted jobs, the key objective is to improve the quality of applicants, thereby delivering value to recruiter customers. To achieve this, we leverage confirmed hire data to construct a graph that evaluates a seeker's qualification for a job, and utilize learned links for retrieval. Our learned model is easy to explain, debug, and adjust. On the other hand, the focus for organic jobs is to optimize seeker engagement. We accomplished this by training embeddings for personalized retrieval, fortified by a set of rules derived from the categorization of member feedback. In addition to a solution based on a conventional inverted index, we developed an on-GPU solution capable of supporting both KNN and term matching efficiently.

Learning to Retrieve for Job Matching

TL;DR

The paper tackles scalable and explainable job matching at LinkedIn by formulating a learning-to-retrieve framework for both promoted and organic channels. It introduces a graph-based auto-targeting approach to learn seeker-job links from confirmed hires for targeted candidate delivery in promoted listings, alongside an Embedding-Based Retrieval (EBR) system with curriculum-guided negative sampling to personalize organic retrieval. A GPU-based exhaustive search pipeline integrates both term matching and embedding scores to achieve low latency at web scale, outperforming inverted-index baselines and delivering meaningful business gains (e.g., improved budget utilization and engagement metrics). The work demonstrates practical, explainable, and scalable retrieval improvements, with clear pathways for future enhancements leveraging split architectures, LLMs, and multilingual capabilities.

Abstract

Web-scale search systems typically tackle the scalability challenge with a two-step paradigm: retrieval and ranking. The retrieval step, also known as candidate selection, often involves extracting standardized entities, creating an inverted index, and performing term matching for retrieval. Such traditional methods require manual and time-consuming development of query models. In this paper, we discuss applying learning-to-retrieve technology to enhance LinkedIns job search and recommendation systems. In the realm of promoted jobs, the key objective is to improve the quality of applicants, thereby delivering value to recruiter customers. To achieve this, we leverage confirmed hire data to construct a graph that evaluates a seeker's qualification for a job, and utilize learned links for retrieval. Our learned model is easy to explain, debug, and adjust. On the other hand, the focus for organic jobs is to optimize seeker engagement. We accomplished this by training embeddings for personalized retrieval, fortified by a set of rules derived from the categorization of member feedback. In addition to a solution based on a conventional inverted index, we developed an on-GPU solution capable of supporting both KNN and term matching efficiently.
Paper Structure (14 sections, 8 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 14 sections, 8 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: We computed the percentage of seekers who engage in at least two sessions within a specified duration per month.
  • Figure 2: LinkedIn job matching system has a flow dedicated to promoted jobs, and a flow dedicated to organic content.
  • Figure 3: We map each seeker $S$ and each job $J$ to a segment $P$ or $Q$, and learn links between seeker and job segments.
  • Figure 4: We transform a graph to 3 layers for online serving by replacing each seeker and job segment link with a node.
  • Figure 5: The two-tower model architecture used in our EBR.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4