Table of Contents
Fetching ...

Large Language Model as Universal Retriever in Industrial-Scale Recommender System

Junguang Jiang, Yanwen Huang, Bin Liu, Xiaoyu Kong, Xinhang Li, Ziru Xu, Han Zhu, Jian Xu, Bo Zheng

TL;DR

This work introduces the Universal Retrieval Model (URM), an LLM-based generative retrieval framework that unifies multiple retrieval objectives into a single input-output system. It enhances expressiveness through multi-query representations, improves learnability and transferability with a W=UV^T matrix decomposition, and reduces inference cost via probabilistic sampling and efficient ANN-based neighbor exploration. The approach demonstrates strong offline performance on public and industrial datasets and delivers online gains, including a 3.01% revenue increase, while maintaining tens-of-milliseconds latency. URM’s ability to adapt to various objectives and even unseen tasks underscores the potential of LLMs as universal retrievers in large-scale recommender systems. Practical deployment considerations and ablations indicate a robust, scalable pathway for industry adoption and future extension to broader objective spaces.

Abstract

In real-world recommender systems, different retrieval objectives are typically addressed using task-specific datasets with carefully designed model architectures. We demonstrate that Large Language Models (LLMs) can function as universal retrievers, capable of handling multiple objectives within a generative retrieval framework. To model complex user-item relationships within generative retrieval, we propose multi-query representation. To address the challenge of extremely large candidate sets in industrial recommender systems, we introduce matrix decomposition to boost model learnability, discriminability, and transferability, and we incorporate probabilistic sampling to reduce computation costs. Finally, our Universal Retrieval Model (URM) can adaptively generate a set from tens of millions of candidates based on arbitrary given objective while keeping the latency within tens of milliseconds. Applied to industrial-scale data, URM outperforms expert models elaborately designed for different retrieval objectives on offline experiments and significantly improves the core metric of online advertising platform by $3\%$.

Large Language Model as Universal Retriever in Industrial-Scale Recommender System

TL;DR

This work introduces the Universal Retrieval Model (URM), an LLM-based generative retrieval framework that unifies multiple retrieval objectives into a single input-output system. It enhances expressiveness through multi-query representations, improves learnability and transferability with a W=UV^T matrix decomposition, and reduces inference cost via probabilistic sampling and efficient ANN-based neighbor exploration. The approach demonstrates strong offline performance on public and industrial datasets and delivers online gains, including a 3.01% revenue increase, while maintaining tens-of-milliseconds latency. URM’s ability to adapt to various objectives and even unseen tasks underscores the potential of LLMs as universal retrievers in large-scale recommender systems. Practical deployment considerations and ablations indicate a robust, scalable pathway for industry adoption and future extension to broader objective spaces.

Abstract

In real-world recommender systems, different retrieval objectives are typically addressed using task-specific datasets with carefully designed model architectures. We demonstrate that Large Language Models (LLMs) can function as universal retrievers, capable of handling multiple objectives within a generative retrieval framework. To model complex user-item relationships within generative retrieval, we propose multi-query representation. To address the challenge of extremely large candidate sets in industrial recommender systems, we introduce matrix decomposition to boost model learnability, discriminability, and transferability, and we incorporate probabilistic sampling to reduce computation costs. Finally, our Universal Retrieval Model (URM) can adaptively generate a set from tens of millions of candidates based on arbitrary given objective while keeping the latency within tens of milliseconds. Applied to industrial-scale data, URM outperforms expert models elaborately designed for different retrieval objectives on offline experiments and significantly improves the core metric of online advertising platform by .

Paper Structure

This paper contains 33 sections, 1 theorem, 9 equations, 13 figures, 15 tables, 1 algorithm.

Key Result

Theorem A.1

Assuming that the representations of two items $v_1$ and $v_2$ are very similar, i.e., $|| W_{v_1}-W_{v_2}|| \leq \epsilon$. Meanwhile, we apply bound constraints to the representation $F(u,o)$, where scaling adjustments will only be applied to $F(u, o)$ if its norm exceeds $B$, then we have

Figures (13)

  • Figure 1: URM architecture. The input sequence consists of user description $u$, retrieval objective $o$, and several fixed query tokens. Item IDs in the user description are mapped to item embeddings by a distributed hashtable, and other tokens are mapped to token embeddings. The item embeddings or token embeddings are summed up with position embeddings, and fed into the LLM backbone (For LLMs using RoPE su2023roformerenhancedtransformerrotary, position embedding is not explicitly added). The outputs corresponding to the query tokens are then mapped to the item candidate space through $W$. To optimize for retrieval objectives in recommender systems, the parameters of LLM backbone are fully fine-tuned.
  • Figure 2: Online Serving System.
  • Figure 3: The effect of query token number $M$.
  • Figure 4: Performance on unseen queries.
  • Figure 5: The effectiveness of multi-task learning.
  • ...and 8 more figures

Theorems & Definitions (2)

  • Theorem A.1
  • proof