Are Large Language Models Really Effective for Training-Free Cold-Start Recommendation?

Genki Kusano; Kenya Abe; Kunihiro Takeoka

Are Large Language Models Really Effective for Training-Free Cold-Start Recommendation?

Genki Kusano, Kenya Abe, Kunihiro Takeoka

TL;DR

This study rigorously compares training-free cold-start recommendation (TFCSR) methods based on large language models (LLMs) and text embedding models (TEMs) under identical conditions. Through controlled experiments on three public datasets, TEMs consistently outperform LLM rerankers in both narrow and broad cold-start settings, with TEMs trained via LLM supervision (e.g., Qwen embeddings) achieving the strongest results. The findings challenge the assumption that LLMs are always optimal for training-free scenarios and highlight TEM-based approaches as more scalable and reliable for TFCSR. The work also provides insights into error patterns, the impact of user history size, and cross-domain transfer, outlining directions for integrating structured data and synthetic training signals.

Abstract

Recommender systems usually rely on large-scale interaction data to learn from users' past behaviors and make accurate predictions. However, real-world applications often face situations where no training data is available, such as when launching new services or handling entirely new users. In such cases, conventional approaches cannot be applied. This study focuses on training-free recommendation, where no task-specific training is performed, and particularly on \textit{training-free cold-start recommendation} (TFCSR), the more challenging case where the target user has no interactions. Large language models (LLMs) have recently been explored as a promising solution, and numerous studies have been proposed. As the ability of text embedding models (TEMs) increases, they are increasingly recognized as applicable to training-free recommendation, but no prior work has directly compared LLMs and TEMs under identical conditions. We present the first controlled experiments that systematically evaluate these two approaches in the same setting. The results show that TEMs outperform LLM rerankers, and this trend holds not only in cold-start settings but also in warm-start settings with rich interactions. These findings indicate that direct LLM ranking is not the only viable option, contrary to the commonly shared belief, and TEM-based approaches provide a stronger and more scalable basis for training-free recommendation.

Are Large Language Models Really Effective for Training-Free Cold-Start Recommendation?

TL;DR

Abstract

Are Large Language Models Really Effective for Training-Free Cold-Start Recommendation?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)