Table of Contents
Fetching ...

Semantic Search At LinkedIn

Fedor Borisyuk, Sriram Vasudevan, Muchen Wu, Guoyao Li, Benjamin Le, Shaobo Zhang, Qianqi Kay Shen, Yuchin Juan, Kayhan Behdin, Liming Dong, Kaixu Yang, Shusen Jing, Ravi Pothamsetty, Rajat Arora, Sophie Yanying Sheng, Vitaly Abdrashitov, Yang Zhao, Lin Su, Xiaoqing Wang, Chujie Zheng, Sarang Metkar, Rupesh Gupta, Igor Lapchuk, David N. Racca, Madhumitha Mohan, Yanbo Li, Haojun Li, Saloni Gandhi, Xueying Lu, Chetan Bhole, Ali Hooshmand, Xin Yang, Raghavan Muthuregunathan, Jiajun Zhang, Mathew Teoh, Adam Coler, Abhinav Gupta, Xiaojing Ma, Sundara Raman Ramachandran, Morteza Ramezani, Yubo Wang, Lijuan Zhang, Richard Li, Jian Sheng, Chanh Nguyen, Yen-Chi Chen, Chuanrui Zhu, Claire Zhang, Jiahao Xu, Deepti Kulkarni, Qing Lan, Arvind Subramaniam, Ata Fatahibaarzi, Steven Shimizu, Yanning Chen, Zhipeng Wang, Ran He, Zhengze Zhou, Qingquan Song, Yun Dai, Caleb Johnson, Ping Liu, Shaghayegh Gharghabi, Gokulraj Mohanasundaram, Juan Bottaro, Santhosh Sachindran, Qi Guo, Yunxiang Ren, Chengming Jiang, Di Mo, Luke Simon, Jianqiang Shen, Jingwei Wu, Wenjing Zhang

TL;DR

This work presents LinkedIn's production-scale semantic search framework that fuses an LLM relevance judge, embedding-based retrieval, and a compact SLM trained via multi-teacher distillation to optimize both relevance and engagement. It introduces a prefill-oriented inference architecture with context compression, model pruning, and MixLM-based text-embedding interactions to achieve over 75x throughput under fixed latency while maintaining near-teacher-level NDCG. Through multi-task supervision, loss masking, calibration, and feature engineering, the system delivers measurable gains in Job and People Search quality and user engagement, enabling practical deployment of LLM-based ranking at scale. The paper also details extensive infrastructure and training optimizations, including data-centric retrieval training, GPU RAR, context summarization, pruning, and caching strategies, with a roadmap toward deeper personalization and further cost reductions.

Abstract

Semantic search with large language models (LLMs) enables retrieval by meaning rather than keyword overlap, but scaling it requires major inference efficiency advances. We present LinkedIn's LLM-based semantic search framework for AI Job Search and AI People Search, combining an LLM relevance judge, embedding-based retrieval, and a compact Small Language Model trained via multi-teacher distillation to jointly optimize relevance and engagement. A prefill-oriented inference architecture co-designed with model pruning, context compression, and text-embedding hybrid interactions boosts ranking throughput by over 75x under a fixed latency constraint while preserving near-teacher-level NDCG, enabling one of the first production LLM-based ranking systems with efficiency comparable to traditional approaches and delivering significant gains in quality and user engagement.

Semantic Search At LinkedIn

TL;DR

This work presents LinkedIn's production-scale semantic search framework that fuses an LLM relevance judge, embedding-based retrieval, and a compact SLM trained via multi-teacher distillation to optimize both relevance and engagement. It introduces a prefill-oriented inference architecture with context compression, model pruning, and MixLM-based text-embedding interactions to achieve over 75x throughput under fixed latency while maintaining near-teacher-level NDCG. Through multi-task supervision, loss masking, calibration, and feature engineering, the system delivers measurable gains in Job and People Search quality and user engagement, enabling practical deployment of LLM-based ranking at scale. The paper also details extensive infrastructure and training optimizations, including data-centric retrieval training, GPU RAR, context summarization, pruning, and caching strategies, with a roadmap toward deeper personalization and further cost reductions.

Abstract

Semantic search with large language models (LLMs) enables retrieval by meaning rather than keyword overlap, but scaling it requires major inference efficiency advances. We present LinkedIn's LLM-based semantic search framework for AI Job Search and AI People Search, combining an LLM relevance judge, embedding-based retrieval, and a compact Small Language Model trained via multi-teacher distillation to jointly optimize relevance and engagement. A prefill-oriented inference architecture co-designed with model pruning, context compression, and text-embedding hybrid interactions boosts ranking throughput by over 75x under a fixed latency constraint while preserving near-teacher-level NDCG, enabling one of the first production LLM-based ranking systems with efficiency comparable to traditional approaches and delivering significant gains in quality and user engagement.
Paper Structure (40 sections, 9 equations, 5 figures, 13 tables)

This paper contains 40 sections, 9 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: An overview of the multi-stage training framework for SLM ranking in semantic search.
  • Figure 2: Relevance Training of SLM Ranker.
  • Figure 3: Negative example selection. Documents are treated as negatives for an action only when that action occurs on another document for the same query.
  • Figure 4: Distribution of predicted probability scores for the multi-task engagement People Search SLM when trained with loss masking versus an unmasked loss (baseline).
  • Figure 5: Retrieval and ranking system architecture