Table of Contents
Fetching ...

Zero-Shot Recommender Systems

Hao Ding, Yifei Ma, Anoop Deoras, Yuyang Wang, Hao Wang

TL;DR

The paper tackles the cold-start challenge in recommender systems by enabling zero-shot recommendations across unseen users and items. It introduces ZESRec, a hierarchical Bayesian framework that grounds items in universal embeddings derived from natural-language descriptions (via BERT) and models users with a sequential embedding network, allowing zero-shot generalization from a source domain to a disjoint target domain. Training occurs with source-domain data, and inference uses MAP-based approximations to generate user/item embeddings for the target domain without fine-tuning on target data. Experiments on Amazon and MIND dataset pairs show that ZESRec outperforms naive zero-shot baselines and rivals oracle in certain settings, demonstrating practical potential for data-scarce startups and new domains.

Abstract

Performance of recommender systems (RS) relies heavily on the amount of training data available. This poses a chicken-and-egg problem for early-stage products, whose amount of data, in turn, relies on the performance of their RS. On the other hand, zero-shot learning promises some degree of generalization from an old dataset to an entirely new dataset. In this paper, we explore the possibility of zero-shot learning in RS. We develop an algorithm, dubbed ZEro-Shot Recommenders (ZESRec), that is trained on an old dataset and generalize to a new one where there are neither overlapping users nor overlapping items, a setting that contrasts typical cross-domain RS that has either overlapping users or items. Different from categorical item indices, i.e., item ID, in previous methods, ZESRec uses items' natural-language descriptions (or description embeddings) as their continuous indices, and therefore naturally generalize to any unseen items. In terms of users, ZESRec builds upon recent advances on sequential RS to represent users using their interactions with items, thereby generalizing to unseen users as well. We study three pairs of real-world RS datasets and demonstrate that ZESRec can successfully enable recommendations in such a zero-shot setting, opening up new opportunities for resolving the chicken-and-egg problem for data-scarce startups or early-stage products.

Zero-Shot Recommender Systems

TL;DR

The paper tackles the cold-start challenge in recommender systems by enabling zero-shot recommendations across unseen users and items. It introduces ZESRec, a hierarchical Bayesian framework that grounds items in universal embeddings derived from natural-language descriptions (via BERT) and models users with a sequential embedding network, allowing zero-shot generalization from a source domain to a disjoint target domain. Training occurs with source-domain data, and inference uses MAP-based approximations to generate user/item embeddings for the target domain without fine-tuning on target data. Experiments on Amazon and MIND dataset pairs show that ZESRec outperforms naive zero-shot baselines and rivals oracle in certain settings, demonstrating practical potential for data-scarce startups and new domains.

Abstract

Performance of recommender systems (RS) relies heavily on the amount of training data available. This poses a chicken-and-egg problem for early-stage products, whose amount of data, in turn, relies on the performance of their RS. On the other hand, zero-shot learning promises some degree of generalization from an old dataset to an entirely new dataset. In this paper, we explore the possibility of zero-shot learning in RS. We develop an algorithm, dubbed ZEro-Shot Recommenders (ZESRec), that is trained on an old dataset and generalize to a new one where there are neither overlapping users nor overlapping items, a setting that contrasts typical cross-domain RS that has either overlapping users or items. Different from categorical item indices, i.e., item ID, in previous methods, ZESRec uses items' natural-language descriptions (or description embeddings) as their continuous indices, and therefore naturally generalize to any unseen items. In terms of users, ZESRec builds upon recent advances on sequential RS to represent users using their interactions with items, thereby generalizing to unseen users as well. We study three pairs of real-world RS datasets and demonstrate that ZESRec can successfully enable recommendations in such a zero-shot setting, opening up new opportunities for resolving the chicken-and-egg problem for data-scarce startups or early-stage products.

Paper Structure

This paper contains 15 sections, 5 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Graphical model for ZESRec. The item side (left) and the user side (right) share the same $\lambda_v$ and ${\bf v}$'s. The plates indicate replication.
  • Figure 2: Incremental training results for baselines using target domain data compared to ZESRec using no data on MIND-NCAA (left two) and Amazon Prime Pantry (right two). To prevent clutter, we only show results for TCN-based and HRNN-based models, since HRNN is an advanced version of GRU4Rec. Results show that even without using target-domain data, ZESRec can still outperform models trained directly using target-domain data for substantial amount of time.
  • Figure 3: Case Study 1. The purchase history of a user in the source domain (top) and the purchase history of an unseen user in the target domain, where all items are unseen during training (bottom). We select two users with similar universal embeddings according to Sec. \ref{['sec:exp_setup']}. This case study demonstrates ZESRec can learn the user behavioral pattern that 'users who bought sugary snacks and tea tend to buy caffeine-free herbal tea later'.
  • Figure 4: Case Study 2. The purchase history of a user in the source domain (top) and the purchase history of an unseen user in the target domain, where all items are unseen during training (bottom). We select two users with similar universal embeddings according to Sec. \ref{['sec:exp_setup']}. This case study demonstrates ZESRec can learn the user behavioral pattern that 'if users bought snacks or drinks that they like, they may later purchase similar snacks or drinks with different flavors'.