Exploring Feature-based Knowledge Distillation for Recommender System: A Frequency Perspective
Zhangchi Zhu, Wei Zhang
TL;DR
This work analyzes feature-based knowledge distillation for recommender systems through a frequency lens, defining knowledge as the $k$-th frequency component and showing that standard FD minimizes all frequencies equally, which can under-allocate emphasis to critical low-frequency knowledge. It introduces a reweighting scheme and a lightweight method, FreqD, that uses graph filtering with a polynomial filter $h(\lambda)$ to emphasize important knowledge without incurring high computational costs. Empirical results on three public datasets across multiple backbones demonstrate that FreqD consistently outperforms existing KD methods and can approach teacher performance while offering substantial inference and training efficiency gains. The proposed approach provides both theoretical insight and a practical tool for more effective knowledge transfer in large-scale recommender systems, with broad implications for frequency-aware distillation in graph-based models.
Abstract
In this paper, we analyze the feature-based knowledge distillation for recommendation from the frequency perspective. By defining knowledge as different frequency components of the features, we theoretically demonstrate that regular feature-based knowledge distillation is equivalent to equally minimizing losses on all knowledge and further analyze how this equal loss weight allocation method leads to important knowledge being overlooked. In light of this, we propose to emphasize important knowledge by redistributing knowledge weights. Furthermore, we propose FreqD, a lightweight knowledge reweighting method, to avoid the computational cost of calculating losses on each knowledge. Extensive experiments demonstrate that FreqD consistently and significantly outperforms state-of-the-art knowledge distillation methods for recommender systems. Our code is available at https://github.com/woriazzc/KDs.
