Introducing Semantic Capability in LinkedIn's Content Search Engine

Xin Yang; Rachel Zheng; Madhumitha Mohan; Sonali Bhadra; Pansul Bhatt; Lingyu; Zhang; Rupesh Gupta

Introducing Semantic Capability in LinkedIn's Content Search Engine

Xin Yang, Rachel Zheng, Madhumitha Mohan, Sonali Bhadra, Pansul Bhatt, Lingyu, Zhang, Rupesh Gupta

TL;DR

The paper addresses the challenge of increasingly long and natural-language search queries by introducing semantic capability into LinkedIn's content search engine. It presents a two-layer architecture with retrieval (token-based and embedding-based) and a two-stage ranking (L1/L2) that leverages a two-tower embedding model based on multilingual-e5, plus precomputed post embeddings and approximate nearest neighbor search. A weighted score combining on-topicness and long-dwell ($score = \alpha \cdot \text{on-topicness} + (1-\alpha) \cdot \text{long-dwell}$) guides ranking, with $\alpha$ tuned online. The approach yields improvements of over 10% in both on-topic rate and long-dwell, and positively impacts sitewide sessions, demonstrating practical benefits in real-world search experience. Future work aims to refine quality metrics and integrate an LLM into ranking for deeper language understanding and relevance.

Abstract

In the past, most search queries issued to a search engine were short and simple. A keyword based search engine was able to answer such queries quite well. However, members are now developing the habit of issuing long and complex natural language queries. Answering such queries requires evolution of a search engine to have semantic capability. In this paper we present the design of LinkedIn's new content search engine with semantic capability, and its impact on metrics.

Introducing Semantic Capability in LinkedIn's Content Search Engine

TL;DR

) guides ranking, with

tuned online. The approach yields improvements of over 10% in both on-topic rate and long-dwell, and positively impacts sitewide sessions, demonstrating practical benefits in real-world search experience. Future work aims to refine quality metrics and integrate an LLM into ranking for deeper language understanding and relevance.

Abstract

Paper Structure (9 sections, 1 equation, 4 figures)

This paper contains 9 sections, 1 equation, 4 figures.

Introduction
Objectives
High-level design
Retrieval layer
Multi-stage ranking layer
Efficient serving
Outcome
What's next?
Acknowledgments

Figures (4)

Figure 1: High-level design of the content search engine consisting of a retrieval layer and a multi-stage ranking layer.
Figure 2: Architecture of the two-tower model used in EBR.
Figure 3: Approximate nearest neighbor search in EBR using precomputed post embeddings (green) and real-time computed query embedding (pink).
Figure 4: Architecture of the models used in L1 and L2 ranking stages.

Introducing Semantic Capability in LinkedIn's Content Search Engine

TL;DR

Abstract

Introducing Semantic Capability in LinkedIn's Content Search Engine

Authors

TL;DR

Abstract

Table of Contents

Figures (4)