Table of Contents
Fetching ...

Dual Correction Strategy for Ranking Distillation in Top-N Recommender System

Youngjune Lee, Kee-Eung Kim

TL;DR

This work tackles the inefficiency and limited perspective of prior ranking-distillation methods in top-$N$ recommender systems with implicit feedback by introducing Dual Correction Strategy for Distillation (DCD). DCD leverages teacher–student ranking discrepancies to dynamically select what knowledge to distill and extends distillation to both user-side and item-side rankings, using correction losses $\mathcal{L}_{UCD}$ and $\mathcal{L}_{ICD}$ in addition to the standard distillation loss $\mathcal{L}_{RKD}$. Empirical results on CiteULike and Foursquare with BPR and NeuMF show that DCD consistently surpasses the state-of-the-art Relaxed Ranking Distillation (RRD) and the non-distilled student, with the most pronounced gains on top-5 metrics. The findings demonstrate that discrepancy-aware, dual-side ranking corrections yield more effective supervision, reduce teacher–student disparity, and offer practical benefits for real-time delivery in sparse implicit-feedback settings.

Abstract

Knowledge Distillation (KD), which transfers the knowledge of a well-trained large model (teacher) to a small model (student), has become an important area of research for practical deployment of recommender systems. Recently, Relaxed Ranking Distillation (RRD) has shown that distilling the ranking information in the recommendation list significantly improves the performance. However, the method still has limitations in that 1) it does not fully utilize the prediction errors of the student model, which makes the training not fully efficient, and 2) it only distills the user-side ranking information, which provides an insufficient view under the sparse implicit feedback. This paper presents Dual Correction strategy for Distillation (DCD), which transfers the ranking information from the teacher model to the student model in a more efficient manner. Most importantly, DCD uses the discrepancy between the teacher model and the student model predictions to decide which knowledge to be distilled. By doing so, DCD essentially provides the learning guidance tailored to "correcting" what the student model has failed to accurately predict. This process is applied for transferring the ranking information from the user-side as well as the item-side to address sparse implicit user feedback. Our experiments show that the proposed method outperforms the state-of-the-art baselines, and ablation studies validate the effectiveness of each component.

Dual Correction Strategy for Ranking Distillation in Top-N Recommender System

TL;DR

This work tackles the inefficiency and limited perspective of prior ranking-distillation methods in top- recommender systems with implicit feedback by introducing Dual Correction Strategy for Distillation (DCD). DCD leverages teacher–student ranking discrepancies to dynamically select what knowledge to distill and extends distillation to both user-side and item-side rankings, using correction losses and in addition to the standard distillation loss . Empirical results on CiteULike and Foursquare with BPR and NeuMF show that DCD consistently surpasses the state-of-the-art Relaxed Ranking Distillation (RRD) and the non-distilled student, with the most pronounced gains on top-5 metrics. The findings demonstrate that discrepancy-aware, dual-side ranking corrections yield more effective supervision, reduce teacher–student disparity, and offer practical benefits for real-time delivery in sparse implicit-feedback settings.

Abstract

Knowledge Distillation (KD), which transfers the knowledge of a well-trained large model (teacher) to a small model (student), has become an important area of research for practical deployment of recommender systems. Recently, Relaxed Ranking Distillation (RRD) has shown that distilling the ranking information in the recommendation list significantly improves the performance. However, the method still has limitations in that 1) it does not fully utilize the prediction errors of the student model, which makes the training not fully efficient, and 2) it only distills the user-side ranking information, which provides an insufficient view under the sparse implicit feedback. This paper presents Dual Correction strategy for Distillation (DCD), which transfers the ranking information from the teacher model to the student model in a more efficient manner. Most importantly, DCD uses the discrepancy between the teacher model and the student model predictions to decide which knowledge to be distilled. By doing so, DCD essentially provides the learning guidance tailored to "correcting" what the student model has failed to accurately predict. This process is applied for transferring the ranking information from the user-side as well as the item-side to address sparse implicit user feedback. Our experiments show that the proposed method outperforms the state-of-the-art baselines, and ablation studies validate the effectiveness of each component.

Paper Structure

This paper contains 14 sections, 9 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Effects of DCD. (a) The average discrepancy from Teacher, (b) H@5 with varying $\lambda_{UCD}$ and $\lambda_{ICD}$.