Table of Contents
Fetching ...

Preference-Consistent Knowledge Distillation for Recommender System

Zhangchi Zhu, Wei Zhang

TL;DR

This paper proposes PCKD, which consists of two regularization terms for projectors and proposes a hybrid method that combines the two regularization terms, and focuses on items with high preference scores and significantly mitigate preference inconsistency, improving the performance of feature-based knowledge distillation.

Abstract

Feature-based knowledge distillation has been applied to compress modern recommendation models, usually with projectors that align student (small) recommendation models' dimensions with teacher dimensions. However, existing studies have only focused on making the projected features (i.e., student features after projectors) similar to teacher features, overlooking investigating whether the user preference can be transferred to student features (i.e., student features before projectors) in this manner. In this paper, we find that due to the lack of restrictions on projectors, the process of transferring user preferences will likely be interfered with. We refer to this phenomenon as preference inconsistency. It greatly wastes the power of feature-based knowledge distillation. To mitigate preference inconsistency, we propose PCKD, which consists of two regularization terms for projectors. We also propose a hybrid method that combines the two regularization terms. We focus on items with high preference scores and significantly mitigate preference inconsistency, improving the performance of feature-based knowledge distillation. Extensive experiments on three public datasets and three backbones demonstrate the effectiveness of PCKD. The code of our method is provided in https://github.com/woriazzc/KDs.

Preference-Consistent Knowledge Distillation for Recommender System

TL;DR

This paper proposes PCKD, which consists of two regularization terms for projectors and proposes a hybrid method that combines the two regularization terms, and focuses on items with high preference scores and significantly mitigate preference inconsistency, improving the performance of feature-based knowledge distillation.

Abstract

Feature-based knowledge distillation has been applied to compress modern recommendation models, usually with projectors that align student (small) recommendation models' dimensions with teacher dimensions. However, existing studies have only focused on making the projected features (i.e., student features after projectors) similar to teacher features, overlooking investigating whether the user preference can be transferred to student features (i.e., student features before projectors) in this manner. In this paper, we find that due to the lack of restrictions on projectors, the process of transferring user preferences will likely be interfered with. We refer to this phenomenon as preference inconsistency. It greatly wastes the power of feature-based knowledge distillation. To mitigate preference inconsistency, we propose PCKD, which consists of two regularization terms for projectors. We also propose a hybrid method that combines the two regularization terms. We focus on items with high preference scores and significantly mitigate preference inconsistency, improving the performance of feature-based knowledge distillation. Extensive experiments on three public datasets and three backbones demonstrate the effectiveness of PCKD. The code of our method is provided in https://github.com/woriazzc/KDs.
Paper Structure (23 sections, 19 equations, 12 figures, 8 tables)

This paper contains 23 sections, 19 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Illustration of feature-based knowledge distillation and preference inconsistency. Although the projected features derive the correct user preference (i.e., item 1 is preferred over item 2), the student does not benefit from it and instead derives the opposite user preference (i.e., item 2 is preferred over item 1).
  • Figure 2: The process of PCKD. It consists of two regularization terms: pair-wise PCKD and list-wise PCKD. To compute them, we first conduct rank-aware sampling, followed by pair-wise or list-wise losses computed using the sampled items.
  • Figure 3: The dynamics of preference inconsistency as training proceeds. Experiments are conducted on CiteULike (left) and Gowalla (right).
  • Figure 4: Group-wise preference inconsistency of DE.
  • Figure 5: The dynamics of preference inconsistency as training proceeds. Experiments are conducted on CiteULike with BPRMF (left) and LightGCN (right) as the backbones.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Definition 4.1: Preference
  • Definition 4.2: Preference Inconsistency