Table of Contents
Fetching ...

Catalog-Native LLM: Speaking Item-ID Dialect with Less Entanglement for Recommendation

Reza Shirkavand, Xiaokai Wei, Chen Wang, Zheng Hui, Heng Huang, Michelle Gong

TL;DR

The paper addresses the challenge of unifying collaborative filtering signals with language-based reasoning in recommender systems without degrading language understanding. It introduces IDIOMoE, a dual-expert Mixture-of-Experts model that treats Item-IDs as a native dialect and routes item tokens to an item expert while text tokens go to a text expert, all within a shared Transformer backbone. Through extensive ablations and a novel FFN key-value memory analysis, the authors demonstrate that expert specialization and fixed token-type routing reduce semantic–collaborative interference, yielding superior performance on public Amazon catalogs and a large industry dataset while preserving linguistic capabilities. The work highlights the value of disentangled modalities for scalable, explainable recommendations and suggests a path toward sustainable, modular LLM-based recommender systems with strong practical impact.

Abstract

While collaborative filtering delivers predictive accuracy and efficiency, and Large Language Models (LLMs) enable expressive and generalizable reasoning, modern recommendation systems must bring these strengths together. Growing user expectations, such as natural-language queries and transparent explanations, further highlight the need for a unified approach. However, doing so is nontrivial. Collaborative signals are often token-efficient but semantically opaque, while LLMs are semantically rich but struggle to model implicit user preferences when trained only on textual inputs. This paper introduces Item-ID + Oral-language Mixture-of-Experts Language Model (IDIOMoE), which treats item interaction histories as a native dialect within the language space, enabling collaborative signals to be understood in the same way as natural language. By splitting the Feed Forward Network of each block of a pretrained LLM into a separate text expert and an item expert with token-type gating, our method avoids destructive interference between text and catalog modalities. IDIOMoE demonstrates strong recommendation performance across both public and proprietary datasets, while preserving the text understanding of the pretrained model.

Catalog-Native LLM: Speaking Item-ID Dialect with Less Entanglement for Recommendation

TL;DR

The paper addresses the challenge of unifying collaborative filtering signals with language-based reasoning in recommender systems without degrading language understanding. It introduces IDIOMoE, a dual-expert Mixture-of-Experts model that treats Item-IDs as a native dialect and routes item tokens to an item expert while text tokens go to a text expert, all within a shared Transformer backbone. Through extensive ablations and a novel FFN key-value memory analysis, the authors demonstrate that expert specialization and fixed token-type routing reduce semantic–collaborative interference, yielding superior performance on public Amazon catalogs and a large industry dataset while preserving linguistic capabilities. The work highlights the value of disentangled modalities for scalable, explainable recommendations and suggests a path toward sustainable, modular LLM-based recommender systems with strong practical impact.

Abstract

While collaborative filtering delivers predictive accuracy and efficiency, and Large Language Models (LLMs) enable expressive and generalizable reasoning, modern recommendation systems must bring these strengths together. Growing user expectations, such as natural-language queries and transparent explanations, further highlight the need for a unified approach. However, doing so is nontrivial. Collaborative signals are often token-efficient but semantically opaque, while LLMs are semantically rich but struggle to model implicit user preferences when trained only on textual inputs. This paper introduces Item-ID + Oral-language Mixture-of-Experts Language Model (IDIOMoE), which treats item interaction histories as a native dialect within the language space, enabling collaborative signals to be understood in the same way as natural language. By splitting the Feed Forward Network of each block of a pretrained LLM into a separate text expert and an item expert with token-type gating, our method avoids destructive interference between text and catalog modalities. IDIOMoE demonstrates strong recommendation performance across both public and proprietary datasets, while preserving the text understanding of the pretrained model.

Paper Structure

This paper contains 50 sections, 4 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Four designs for recommendation with Transformers/LLMs. (a) ID-only Transformer: trained from scratch on item-ID sequences, with no pretrained LLM involved. (b) Text-derived bias: a pretrained LLM on IDs, with an external text encoder providing side features that bias item scores. (c) Explicit text tokens: a pretrained LLM that directly consumes both item-ID tokens and (possibly) text tokens in the same sequence. (d) Explicit text tokens + extra capacity: like (c), but adds item-specific parameters to better handle IDs. IDIOMoE is a special case of (d).
  • Figure 2: Overview of our proposed IDIOMoE. We extend the LLM tokenizer with new "item-id" tokens and introduce a dedicated item embedding layer. The Normalization and Attention layers are shared across all token types, while tokens are routed to distinct FFN layers depending on their type.
  • Figure 3: Language understanding retention.
  • Figure 4: Non-MoE capacity controls on Amazon-Beauty and Industrial datasets. All variants are matched to IDIOMoE in parameter count. Results are shown as relative improvements over Item-LLM.
  • Figure 5: Impact of varying item expert capacity.
  • ...and 5 more figures