Table of Contents
Fetching ...

D2K: Turning Historical Data into Retrievable Knowledge for Recommender Systems

Jiarui Qin, Weiwen Liu, Ruiming Tang, Weinan Zhang, Yong Yu

TL;DR

The paper tackles the challenge of preserving knowledge from ever-growing historical user data in recommender systems by proposing D2K, a data-centric framework that stores ternary user-item-context knowledge in a retrievable knowledge base. A Transformer-based encoder converts ternary features into fixed knowledge vectors, while a personalized adaptation unit maps global knowledge to target samples, enabling direct, model-agnostic knowledge injection into various RS backbones. Empirical results on two large-scale datasets show D2K, particularly the D2K-adp-share variant, consistently improves AUC over strong baselines and remains robust under knowledge updates and reduced knowledge-base size. The work demonstrates significant advantages in scalability, explicitness, and compatibility, underscoring the practical value of turning logs into retrievable knowledge for recommender systems.

Abstract

A vast amount of user behavior data is constantly accumulating on today's large recommendation platforms, recording users' various interests and tastes. Preserving knowledge from the old data while new data continually arrives is a vital problem for recommender systems. Existing approaches generally seek to save the knowledge implicitly in the model parameters. However, such a parameter-centric approach lacks scalability and flexibility -- the capacity is hard to scale, and the knowledge is inflexible to utilize. Hence, in this work, we propose a framework that turns massive user behavior data to retrievable knowledge (D2K). It is a data-centric approach that is model-agnostic and easy to scale up. Different from only storing unary knowledge such as the user-side or item-side information, D2K propose to store ternary knowledge for recommendation, which is determined by the complete recommendation factors -- user, item, and context. The knowledge retrieved by target samples can be directly used to enhance the performance of any recommendation algorithms. Specifically, we introduce a Transformer-based knowledge encoder to transform the old data into knowledge with the user-item-context cross features. A personalized knowledge adaptation unit is devised to effectively exploit the information from the knowledge base by adapting the retrieved knowledge to the target samples. Extensive experiments on two public datasets show that D2K significantly outperforms existing baselines and is compatible with a major collection of recommendation algorithms.

D2K: Turning Historical Data into Retrievable Knowledge for Recommender Systems

TL;DR

The paper tackles the challenge of preserving knowledge from ever-growing historical user data in recommender systems by proposing D2K, a data-centric framework that stores ternary user-item-context knowledge in a retrievable knowledge base. A Transformer-based encoder converts ternary features into fixed knowledge vectors, while a personalized adaptation unit maps global knowledge to target samples, enabling direct, model-agnostic knowledge injection into various RS backbones. Empirical results on two large-scale datasets show D2K, particularly the D2K-adp-share variant, consistently improves AUC over strong baselines and remains robust under knowledge updates and reduced knowledge-base size. The work demonstrates significant advantages in scalability, explicitness, and compatibility, underscoring the practical value of turning logs into retrievable knowledge for recommender systems.

Abstract

A vast amount of user behavior data is constantly accumulating on today's large recommendation platforms, recording users' various interests and tastes. Preserving knowledge from the old data while new data continually arrives is a vital problem for recommender systems. Existing approaches generally seek to save the knowledge implicitly in the model parameters. However, such a parameter-centric approach lacks scalability and flexibility -- the capacity is hard to scale, and the knowledge is inflexible to utilize. Hence, in this work, we propose a framework that turns massive user behavior data to retrievable knowledge (D2K). It is a data-centric approach that is model-agnostic and easy to scale up. Different from only storing unary knowledge such as the user-side or item-side information, D2K propose to store ternary knowledge for recommendation, which is determined by the complete recommendation factors -- user, item, and context. The knowledge retrieved by target samples can be directly used to enhance the performance of any recommendation algorithms. Specifically, we introduce a Transformer-based knowledge encoder to transform the old data into knowledge with the user-item-context cross features. A personalized knowledge adaptation unit is devised to effectively exploit the information from the knowledge base by adapting the retrieved knowledge to the target samples. Extensive experiments on two public datasets show that D2K significantly outperforms existing baselines and is compatible with a major collection of recommendation algorithms.
Paper Structure (34 sections, 13 equations, 7 figures, 9 tables, 1 algorithm)

This paper contains 34 sections, 13 equations, 7 figures, 9 tables, 1 algorithm.

Figures (7)

  • Figure 1: Parameter-centric knowledge vs. data-centric knowledge in preserving information. (a) Workflow comparison. (b) Conceptual illustration of information capacity.
  • Figure 2: Comparison between unary knowledge and ternary knowledge. "klg" stands for knowledge.
  • Figure 3: The framework of D2K.
  • Figure 4: The structure of the knowledge encoder of D2K.
  • Figure 5: The structure of the personalized knowledge adaptation unit.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Definition 1: Direct Knowledge