Table of Contents
Fetching ...

A Unified Language Model for Large Scale Search, Recommendation, and Reasoning

Marco De Nadai, Edoardo D'Amico, Max Lefarov, Alexandre Tamborrino, Divita Vohra, Mark VanMiddlesworth, Shawn Lin, Jacqueline Wood, Jan Stypka, Eliza Klyce, Keshi Dai, Timothy Christopher Heath, Martin D. Gould, Yves Raimond, Sandeep Ghael, Tony Jebara, Andreas Damianou, Vladan Radosavljevic, Paul N. Bennett, Mounia Lalmas, Praveen Chandar

Abstract

LLMs are increasingly applied to recommendation, retrieval, and reasoning, yet deploying a single end-to-end model that can jointly support these behaviors over large, heterogeneous catalogs remains challenging. Such systems must generate unambiguous references to real items, handle multiple entity types, and operate under strict latency and reliability constraints requirements that are difficult to satisfy with text-only generation. While tool-augmented recommender systems address parts of this problem, they introduce orchestration complexity and limit end-to-end optimization. We view this setting as an instance of a broader research problem: how to adapt LLMs to reason jointly over multiple-domain entities, users, and language in a fully self-contained manner. To this end, we introduce NEO, a framework that adapts a pre-trained decoder-only LLM into a tool-free, catalog-grounded generator. NEO represents items as SIDs and trains a single model to interleave natural language and typed item identifiers within a shared sequence. Text prompts control the task, target entity type, and output format (IDs, text, or mixed), while constrained decoding guarantees catalog-valid item generation without restricting free-form text. We refer to this instruction-conditioned controllability as language-steerability. We treat SIDs as a distinct modality and study design choices for integrating discrete entity representations into LLMs via staged alignment and instruction tuning. We evaluate NEO at scale on a real-world catalog of over 10M items across multiple media types and discovery tasks, including recommendation, search, and user understanding. In offline experiments, NEO consistently outperforms strong task-specific baselines and exhibits cross-task transfer, demonstrating a practical path toward consolidating large-scale discovery capabilities into a single language-steerable generative model.

A Unified Language Model for Large Scale Search, Recommendation, and Reasoning

Abstract

LLMs are increasingly applied to recommendation, retrieval, and reasoning, yet deploying a single end-to-end model that can jointly support these behaviors over large, heterogeneous catalogs remains challenging. Such systems must generate unambiguous references to real items, handle multiple entity types, and operate under strict latency and reliability constraints requirements that are difficult to satisfy with text-only generation. While tool-augmented recommender systems address parts of this problem, they introduce orchestration complexity and limit end-to-end optimization. We view this setting as an instance of a broader research problem: how to adapt LLMs to reason jointly over multiple-domain entities, users, and language in a fully self-contained manner. To this end, we introduce NEO, a framework that adapts a pre-trained decoder-only LLM into a tool-free, catalog-grounded generator. NEO represents items as SIDs and trains a single model to interleave natural language and typed item identifiers within a shared sequence. Text prompts control the task, target entity type, and output format (IDs, text, or mixed), while constrained decoding guarantees catalog-valid item generation without restricting free-form text. We refer to this instruction-conditioned controllability as language-steerability. We treat SIDs as a distinct modality and study design choices for integrating discrete entity representations into LLMs via staged alignment and instruction tuning. We evaluate NEO at scale on a real-world catalog of over 10M items across multiple media types and discovery tasks, including recommendation, search, and user understanding. In offline experiments, NEO consistently outperforms strong task-specific baselines and exhibits cross-task transfer, demonstrating a practical path toward consolidating large-scale discovery capabilities into a single language-steerable generative model.
Paper Structure (53 sections, 8 equations, 10 figures, 9 tables)

This paper contains 53 sections, 8 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: NEO adapts pre-trained LLMs to speak multi-item-type catalogs (e.g. audiobooks and podcasts) through SIDs (e.g. 2 tokens long tuples), and to output both text and SIDs.
  • Figure 2: We illustrate our approach as the first three stages in a more general four-stage pipeline. In the general case, in Stage 1, semantic representations are learned---such as semantic IDs (SIDs)---to capture entities, their structure, and relationships. In Stage 2, domain grounding is performed via one or both of (i) adaptation over user-behavior sequences and (ii) domain alignment between SID/text pairs, to learn a shared latent space between entities and natural language. This enables the model to use text and SIDs interchangeably and to learn sequential, domain-relevant concepts. Stage 3 teaches the model to answer questions, follow commands, and generate grounded outputs through multi-task instruction-following. We prioritize diversity of tasks over depth within any single task. In Stage 4, we may leverage task-specific datasets and objectives related to the targeted use case (e.g., topic discovery and business constraints) via task-specific fine-tuning, reinforcement learning, and related techniques. We indicate typical choices for which parameters are kept frozen (ice) or trainable (fire) for each stage.
  • Figure 3: Recsplanation result for an episode request. This randomly selected result shows that the model can interleave recommendations (SIDs) and explanation (text) fluently.
  • Figure 4: Qualitative example of user understanding for an episode, from the SID history of an user.
  • Figure 5: Prompt template used for LLM as a Judge for the user analysis task.
  • ...and 5 more figures