Table of Contents
Fetching ...

PLUM: Adapting Pre-trained Language Models for Industrial-scale Generative Recommendations

Ruining He, Lukasz Heldt, Lichan Hong, Raghunandan Keshavan, Shifan Mao, Nikhil Mehta, Zhengyang Su, Alicia Tsai, Yueqi Wang, Shao-Chuan Wang, Xinyang Yi, Lexi Baugher, Baykal Cakici, Ed Chi, Cristos Goodrow, Ningren Han, He Ma, Romer Rosales, Abby Van Soest, Devansh Tandon, Su-Lin Wu, Weilong Yang, Yilin Zheng

TL;DR

PLUM addresses the scalability gap in industrial recommender systems by replacing large embedding tables with a generative SID-based input processed by an adapted LLM. The approach integrates SIDv2 item tokenization, a large-scale continued pre-training stage, and a supervised fine-tuning objective for generative retrieval, enabling efficient retrieval at YouTube scale without maintaining exhaustive item embeddings. Empirical results on billions of interactions show that PLUM improves retrieval effectiveness and sample efficiency relative to production embedding-based baselines, and scaling analyses reveal favorable performance trends with MoE model sizes under Iso-FLOPS budgets. The work provides practical deployment insights and outlines future directions for extending LLM-based generative retrieval to ranking and search within industrial systems.

Abstract

Large Language Models (LLMs) pose a new paradigm of modeling and computation for information tasks. Recommendation systems are a critical application domain poised to benefit significantly from the sequence modeling capabilities and world knowledge inherent in these large models. In this paper, we introduce PLUM, a framework designed to adapt pre-trained LLMs for industry-scale recommendation tasks. PLUM consists of item tokenization using Semantic IDs, continued pre-training (CPT) on domain-specific data, and task-specific fine-tuning for recommendation objectives. For fine-tuning, we focus particularly on generative retrieval, where the model is directly trained to generate Semantic IDs of recommended items based on user context. We conduct comprehensive experiments on large-scale internal video recommendation datasets. Our results demonstrate that PLUM achieves substantial improvements for retrieval compared to a heavily-optimized production model built with large embedding tables. We also present a scaling study for the model's retrieval performance, our learnings about CPT, a few enhancements to Semantic IDs, along with an overview of the training and inference methods that enable launching this framework to billions of users in YouTube.

PLUM: Adapting Pre-trained Language Models for Industrial-scale Generative Recommendations

TL;DR

PLUM addresses the scalability gap in industrial recommender systems by replacing large embedding tables with a generative SID-based input processed by an adapted LLM. The approach integrates SIDv2 item tokenization, a large-scale continued pre-training stage, and a supervised fine-tuning objective for generative retrieval, enabling efficient retrieval at YouTube scale without maintaining exhaustive item embeddings. Empirical results on billions of interactions show that PLUM improves retrieval effectiveness and sample efficiency relative to production embedding-based baselines, and scaling analyses reveal favorable performance trends with MoE model sizes under Iso-FLOPS budgets. The work provides practical deployment insights and outlines future directions for extending LLM-based generative retrieval to ranking and search within industrial systems.

Abstract

Large Language Models (LLMs) pose a new paradigm of modeling and computation for information tasks. Recommendation systems are a critical application domain poised to benefit significantly from the sequence modeling capabilities and world knowledge inherent in these large models. In this paper, we introduce PLUM, a framework designed to adapt pre-trained LLMs for industry-scale recommendation tasks. PLUM consists of item tokenization using Semantic IDs, continued pre-training (CPT) on domain-specific data, and task-specific fine-tuning for recommendation objectives. For fine-tuning, we focus particularly on generative retrieval, where the model is directly trained to generate Semantic IDs of recommended items based on user context. We conduct comprehensive experiments on large-scale internal video recommendation datasets. Our results demonstrate that PLUM achieves substantial improvements for retrieval compared to a heavily-optimized production model built with large embedding tables. We also present a scaling study for the model's retrieval performance, our learnings about CPT, a few enhancements to Semantic IDs, along with an overview of the training and inference methods that enable launching this framework to billions of users in YouTube.

Paper Structure

This paper contains 49 sections, 3 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Illustration of our Semantic ID model. It takes two multi-modal video embeddings, encodes them, and compresses the result into a quantized ID using a residual quantizer. This ID is trained to both reconstruct the original inputs and semantically cluster co-occurring videos using a contrastive loss.
  • Figure 2: Illustration of Generative Retrieval for next video recommendation. The input prompt is a sequence of interleaved SID tokens, text and custom tokens for numerical features.
  • Figure 3: 8-th Day Recall@10 and training loss vs retrieval SFT training step.
  • Figure 4: Training and evaluation loss variation as we scale up training Iso-FLOPS.
  • Figure 5: Training and evaluation Recall@10 variation as we scale up training Iso-FLOPS.
  • ...and 1 more figures