Active Large Language Model-based Knowledge Distillation for Session-based Recommendation
Yingpeng Du, Zhu Sun, Ziyan Wang, Haoyan Chua, Jie Zhang, Yew-Soon Ong
TL;DR
This work tackles the high computational cost of using large language models for session-based recommendation by proposing ALKDRec, an active knowledge distillation framework. ALKDRec distills knowledge from an LLM teacher into a lightweight student by selectively querying the LLM on a small, optimally chosen subset of sessions, guided by an active learning strategy that maximizes the minimal expected distillation gain. The method combines an enhanced LLM teacher, KD from the LLM to the student, and a minimax-based instance selection policy, with theoretical guarantees and empirical validation on real-world datasets showing significant performance and efficiency gains over state-of-the-art KD baselines. The approach offers a practical pathway to deploy LLM-informed recommendations in resource-constrained environments while maintaining high accuracy. Overall, ALKDRec demonstrates that carefully curated, theory-grounded active KD can unlock effective LLM-based recommendations at scale, with broad implications for sustainable AI in recommender systems.
Abstract
Large language models (LLMs) provide a promising way for accurate session-based recommendation (SBR), but they demand substantial computational time and memory. Knowledge distillation (KD)-based methods can alleviate these issues by transferring the knowledge to a small student, which trains a student based on the predictions of a cumbersome teacher. However, these methods encounter difficulties for \textit{LLM-based KD in SBR}. 1) It is expensive to make LLMs predict for all instances in KD. 2) LLMs may make ineffective predictions for some instances in KD, e.g., incorrect predictions for hard instances or similar predictions as existing recommenders for easy instances. In this paper, we propose an active LLM-based KD method in SBR, contributing to sustainable AI. To efficiently distill knowledge from LLMs with limited cost, we propose to extract a small proportion of instances predicted by LLMs. Meanwhile, for a more effective distillation, we propose an active learning strategy to extract instances that are as effective as possible for KD from a theoretical view. Specifically, we first formulate gains based on potential effects (e.g., effective, similar, and incorrect predictions by LLMs) and difficulties (e.g., easy or hard to fit) of instances for KD. Then, we propose to maximize the minimal gains of distillation to find the optimal selection policy for active learning, which can largely avoid extracting ineffective instances in KD. Experiments on real-world datasets show that our method significantly outperforms state-of-the-art methods for SBR.
