NineRec: A Benchmark Dataset Suite for Evaluating Transferable Recommendation

Jiaqi Zhang; Yu Cheng; Yongxin Ni; Yunzhu Pan; Zheng Yuan; Junchen Fu; Youhua Li; Jie Wang; Fajie Yuan

NineRec: A Benchmark Dataset Suite for Evaluating Transferable Recommendation

Jiaqi Zhang, Yu Cheng, Yongxin Ni, Yunzhu Pan, Zheng Yuan, Junchen Fu, Youhua Li, Jie Wang, Fajie Yuan

TL;DR

NineRec introduces a large-scale TransRec benchmark suite to advance transferable recommendation by learning directly from raw multimodal item content. The dataset combines a substantial source domain (Bili_2M) with nine diverse target domains, each providing textual descriptions and cover images, enabling end-to-end multimodal learning. Extensive experiments compare MoRec and TransRec baselines, highlighting that end-to-end training with modality encoders generally outperforms two-stage approaches and that cross-domain pre-training enhances downstream transfer, albeit with substantial computational costs. NineRec thereby offers a public platform for evaluating transferability, benchmarking architectures, and fostering cross-pollination between recommender systems, NLP, and computer vision. The work also addresses privacy and copyright considerations and provides a pathway for future improvements through larger pre-training scales and optimized UE/ME designs.

Abstract

Large foundational models, through upstream pre-training and downstream fine-tuning, have achieved immense success in the broad AI community due to improved model performance and significant reductions in repetitive engineering. By contrast, the transferable one-for-all models in the recommender system field, referred to as TransRec, have made limited progress. The development of TransRec has encountered multiple challenges, among which the lack of large-scale, high-quality transfer learning recommendation dataset and benchmark suites is one of the biggest obstacles. To this end, we introduce NineRec, a TransRec dataset suite that comprises a large-scale source domain recommendation dataset and nine diverse target domain recommendation datasets. Each item in NineRec is accompanied by a descriptive text and a high-resolution cover image. Leveraging NineRec, we enable the implementation of TransRec models by learning from raw multimodal features instead of relying solely on pre-extracted off-the-shelf features. Finally, we present robust TransRec benchmark results with several classical network architectures, providing valuable insights into the field. To facilitate further research, we will release our code, datasets, benchmarks, and leaderboards at https://github.com/westlake-repl/NineRec.

NineRec: A Benchmark Dataset Suite for Evaluating Transferable Recommendation

TL;DR

Abstract

Paper Structure (18 sections, 11 figures, 17 tables)

This paper contains 18 sections, 11 figures, 17 tables.

Introduction
Introduction
NineRec Dataset Suite
Dataset Summary
Dataset Construction & Analysis
Copyrights and Privacy
Comparison to Existing Datasets
Related work of TransRec
Baselines Overview
TransRec Benchmark
Evaluation
Experimental Setting
Benchmarking User Encoders
Benchmarking Item Encoders
End-to-End (E2E) vs. Two-Stage (TS) Benchmark
...and 3 more sections

Figures (11)

Figure 1: Image cases of NineRec vs. Amazon. (a) Compared to Amazon, the images in NineRec are more abstract and semantically enriched; (b) NineRec supports cross-platform recommendation; Bili, TN, KU, DY, QB are different recommender systems; (c) User intent in Amazon is largely influenced by item price, which cannot be achieved by learning only appearance or visual features.
Figure 2: Dataset details. Top: item popularity distribution; Middle: user interaction length distribution; Bottom: the occurring time of user-item interactions. See more in Appendix Figure 2.
Figure 3: TransRec architectures (S2S & S2O). BERT and Swin-B (Swin Transformer) are used as ME. DTL is the DNN layers for dimension transformation. UE can be a stack of DNN, RNN, CNN, or MHSA layers. $\tilde{Z}_{v=1},...,\tilde{Z}_{v=n}$ are vector generated by UE, $e_{v=1},...,e_{v={n+1}}$ are vectors generated by ME.
Figure 4: Benchmark results (y-axis:%) of item ME (with SASRec as UE). The details of ResNet50, Swin-T, Swin-S and Swin-B are provided in Appendix Table 4. All hyper-parameters are kept the same for NoPT, HasPT, and TFS. TFS means TransRec is not pre-trained on the source dataset, and its ME is not pre-trained on ImageNet. The dashed yellow line only shows ResNet50.
Figure 5: Image Examples of NineRec vs. Amazon vs. GEST. Images in GEST are mainly about food and restaurants. Images in Amazon are mainly about single products with very low semantics.
...and 6 more figures

NineRec: A Benchmark Dataset Suite for Evaluating Transferable Recommendation

TL;DR

Abstract

NineRec: A Benchmark Dataset Suite for Evaluating Transferable Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)