Curriculum-scheduled Knowledge Distillation from Multiple Pre-trained Teachers for Multi-domain Sequential Recommendation
Wenqi Sun, Ruobing Xie, Junjie Zhang, Wayne Xin Zhao, Leyu Lin, Ji-Rong Wen
TL;DR
CKD-MDSR addresses the challenge of leveraging heterogeneous pre-trained recommendation models in multi-domain sequential recommendation by distilling knowledge from multiple PRMs into a lightweight student. It introduces curriculum-scheduled sampling to progressively learn from sequences of increasing difficulty, in-batch negative sampling to distill cross-teacher signals, and a consistency-aware integration mechanism to weigh and fuse knowledge from UniSRec, Recformer, and UniM$^2$Rec. The approach yields strong improvements across five real-world datasets, remains efficient at inference comparable to standard SR models, and proves its universality by applying to diverse student architectures (FM, DeepFM, LightGCN). The work demonstrates practical potential for deploying PRMs as a knowledge source without online overhead, enabling robust, cross-domain recommendations in production systems.
Abstract
Pre-trained recommendation models (PRMs) have received increasing interest recently. However, their intrinsically heterogeneous model structure, huge model size and computation cost hinder their adoptions in practical recommender systems. Hence, it is highly essential to explore how to use different pre-trained recommendation models efficiently in real-world systems. In this paper, we propose a novel curriculum-scheduled knowledge distillation from multiple pre-trained teachers for multi-domain sequential recommendation, called CKD-MDSR, which takes full advantages of different PRMs as multiple teacher models to boost a small student recommendation model, integrating the knowledge across multiple domains from PRMs. Specifically, CKD-MDSR first adopts curriculum-scheduled user behavior sequence sampling and distills informative knowledge jointly from the representative PRMs such as UniSRec and Recformer. Then, the knowledge from the above PRMs are selectively integrated into the student model in consideration of their confidence and consistency. Finally, we verify the proposed method on multi-domain sequential recommendation and further demonstrate its universality with multiple types of student models, including feature interaction and graph based recommendation models. Extensive experiments on five real-world datasets demonstrate the effectiveness and efficiency of CKD-MDSR, which can be viewed as an efficient shortcut using PRMs in real-world systems.
