Table of Contents
Fetching ...

Cold-Starts in Generative Recommendation: A Reproducibility Study

Zhen Zhang, Jujia Zhao, Xinyu Ma, Xin Xin, Maarten de Rijke, Zhaochun Ren

Abstract

Cold-start recommendation remains a central challenge in dynamic, open-world platforms, requiring models to recommend for newly registered users (user cold-start) and to recommend newly introduced items to existing users (item cold-start) under sparse or missing interaction signals. Recent generative recommenders built on pre-trained language models (PLMs) are often expected to mitigate cold-start by using item semantic information (e.g., titles and descriptions) and test-time conditioning on limited user context. However, cold-start is rarely treated as a primary evaluation setting in existing studies, and reported gains are difficult to interpret because key design choices, such as model scale, identifier design, and training strategy, are frequently changed together. In this work, we present a systematic reproducibility study of generative recommendation under a unified suite of cold-start protocols.

Cold-Starts in Generative Recommendation: A Reproducibility Study

Abstract

Cold-start recommendation remains a central challenge in dynamic, open-world platforms, requiring models to recommend for newly registered users (user cold-start) and to recommend newly introduced items to existing users (item cold-start) under sparse or missing interaction signals. Recent generative recommenders built on pre-trained language models (PLMs) are often expected to mitigate cold-start by using item semantic information (e.g., titles and descriptions) and test-time conditioning on limited user context. However, cold-start is rarely treated as a primary evaluation setting in existing studies, and reported gains are difficult to interpret because key design choices, such as model scale, identifier design, and training strategy, are frequently changed together. In this work, we present a systematic reproducibility study of generative recommendation under a unified suite of cold-start protocols.

Paper Structure

This paper contains 27 sections, 2 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Generative recommendation pipeline.
  • Figure 2: Comparison of recall performance under warm-start and cold-start conditions across different model scales. The plot shows Recall@10 for both item and user cold-start settings as model size increases, using representative generative recommender methods (TIGER) with different variants of Flan-T5.
  • Figure 3: Performance comparison across different identifier designs. This figure illustrates the Recall@10 performance under warm-start, item cold-start, and user cold-start conditions for various identifier types: Atomic IDs, Textual Titles, and Semantic Codes (RQ-VAE, Balanced k-means, and OPQ).