Contextual Distillation Model for Diversified Recommendation

Fan Li; Xu Si; Shisong Tang; Dingmin Wang; Kunyan Han; Bing Han; Guorui Zhou; Yang Song; Hechang Chen

Contextual Distillation Model for Diversified Recommendation

Fan Li, Xu Si, Shisong Tang, Dingmin Wang, Kunyan Han, Bing Han, Guorui Zhou, Yang Song, Hechang Chen

TL;DR

This paper tackles the need for diversified recommendations across all stages of industrial pipelines, not just re-ranking, by introducing Contextual Distillation Model (CDM). CDM uses a Contrastive Context Encoder to extract context from candidate items and trains a student to predict the MMR-driven win probability via knowledge distillation from a greedy MMR teacher, enabling end-to-end learning with $O(N \log K)$ inference. The method achieves improvements in both accuracy (Recall, MRR) and diversity (ILAD, CC) on two industrial datasets and demonstrates online gains in engagement and content diversity in a KuaiShou A/B test. The framework is flexible, scalable, and can substitute alternative diversity signals (e.g., DPP, GSP) while maintaining efficiency in large-scale recommendation pipelines.

Abstract

The diversity of recommendation is equally crucial as accuracy in improving user experience. Existing studies, e.g., Determinantal Point Process (DPP) and Maximal Marginal Relevance (MMR), employ a greedy paradigm to iteratively select items that optimize both accuracy and diversity. However, prior methods typically exhibit quadratic complexity, limiting their applications to the re-ranking stage and are not applicable to other recommendation stages with a larger pool of candidate items, such as the pre-ranking and ranking stages. In this paper, we propose Contextual Distillation Model (CDM), an efficient recommendation model that addresses diversification, suitable for the deployment in all stages of industrial recommendation pipelines. Specifically, CDM utilizes the candidate items in the same user request as context to enhance the diversification of the results. We propose a contrastive context encoder that employs attention mechanisms to model both positive and negative contexts. For the training of CDM, we compare each target item with its context embedding and utilize the knowledge distillation framework to learn the win probability of each target item under the MMR algorithm, where the teacher is derived from MMR outputs. During inference, ranking is performed through a linear combination of the recommendation and student model scores, ensuring both diversity and efficiency. We perform offline evaluations on two industrial datasets and conduct online A/B test of CDM on the short-video platform KuaiShou. The considerable enhancements observed in both recommendation quality and diversity, as shown by metrics, provide strong superiority for the effectiveness of CDM.

Contextual Distillation Model for Diversified Recommendation

TL;DR

inference. The method achieves improvements in both accuracy (Recall, MRR) and diversity (ILAD, CC) on two industrial datasets and demonstrates online gains in engagement and content diversity in a KuaiShou A/B test. The framework is flexible, scalable, and can substitute alternative diversity signals (e.g., DPP, GSP) while maintaining efficiency in large-scale recommendation pipelines.

Abstract

Paper Structure (27 sections, 22 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 27 sections, 22 equations, 4 figures, 4 tables, 1 algorithm.

Introduction
Preliminaries
Traditional Recommendation
Diversified Recommendation
Approximation Calculation
Method
Interest-Aware MMR
Contrastive Context Learning
Differentiable Sampling with Gumbel-Top-$k$ Reparameterization.
Contrastive Context Encoder
Contextual Distillation Model
Discussion
Time Complexity
Scalability
Experiment
...and 12 more sections

Figures (4)

Figure 1: An industry recommendation pipeline and different item sets for one user request. Candidate items refer to: the set of items that are to be scored one user request during the ranking stage.
Figure 2: The probability density distribution of a target item, where randomly selected from a candidate set in a single user request, corresponds to different sets of item embeddings. The positive context comprises the 100 candidate items most similar to target item, and the negative context, the least similar.
Figure 3: The architecture of our proposed CDM. The teacher model is the MMR algorithm, and the student model predicts each item's winning probability with context.
Figure 4: Performance on accuracy and diversity when varying $\gamma$.

Contextual Distillation Model for Diversified Recommendation

TL;DR

Abstract

Contextual Distillation Model for Diversified Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)