Table of Contents
Fetching ...

How Well Does Generative Recommendation Generalize?

Yijie Ding, Zitian Guo, Jiacheng Li, Letian Peng, Shuai Shao, Wei Shao, Xiaoqiang Luo, Luke Simon, Jingbo Shang, Julian McAuley, Yupeng Hou

Abstract

A widely held hypothesis for why generative recommendation (GR) models outperform conventional item ID-based models is that they generalize better. However, there is few systematic way to verify this hypothesis beyond a superficial comparison of overall performance. To address this gap, we categorize each data instance based on the specific capability required for a correct prediction: either memorization (reusing item transition patterns observed during training) or generalization (composing known patterns to predict unseen item transitions). Extensive experiments show that GR models perform better on instances that require generalization, whereas item ID-based models perform better when memorization is more important. To explain this divergence, we shift the analysis from the item level to the token level and show that what appears to be item-level generalization often reduces to token-level memorization for GR models. Finally, we show that the two paradigms are complementary. We propose a simple memorization-aware indicator that adaptively combines them on a per-instance basis, leading to improved overall recommendation performance.

How Well Does Generative Recommendation Generalize?

Abstract

A widely held hypothesis for why generative recommendation (GR) models outperform conventional item ID-based models is that they generalize better. However, there is few systematic way to verify this hypothesis beyond a superficial comparison of overall performance. To address this gap, we categorize each data instance based on the specific capability required for a correct prediction: either memorization (reusing item transition patterns observed during training) or generalization (composing known patterns to predict unseen item transitions). Extensive experiments show that GR models perform better on instances that require generalization, whereas item ID-based models perform better when memorization is more important. To explain this divergence, we shift the analysis from the item level to the token level and show that what appears to be item-level generalization often reduces to token-level memorization for GR models. Finally, we show that the two paradigms are complementary. We propose a simple memorization-aware indicator that adaptively combines them on a per-instance basis, leading to improved overall recommendation performance.
Paper Structure (23 sections, 10 equations, 9 figures, 4 tables)

This paper contains 23 sections, 10 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Illustrated definitions for memorization vs. generalization. We define memorization and different sub-categories of generalization based on (1) the transition patterns observed in training data, and (2) the patterns required to infer.
  • Figure 2: Illustration of multi-hop generalization.
  • Figure 3: Illustration of how item-level generalization can be reduced to token-level memorization for GR models.
  • Figure 3: Experiment configurations and token memorization ratio across different semantic ID (SID) configurations.
  • Figure 4: Token memorization ratios for each item-level generalization category. The X-axis represents the prefix length for token memorization.
  • ...and 4 more figures