Table of Contents
Fetching ...

Leave No One Behind: Online Self-Supervised Self-Distillation for Sequential Recommendation

Shaowei Wei, Zhengwei Wu, Xin Li, Qintong Wu, Zhiqiang Zhang, Jun Zhou, Lihong Gu, Jinjie Gu

TL;DR

The paper tackles data sparsity in sequential recommendation by marrying self-supervised learning with self-distillation. It introduces S^4Rec, which uses online clustering to group users by latent intents, an adversarial head-tail mechanism to neutralize sequence-length bias, and cluster-aware self-distillation to transfer knowledge from users with long histories to those with limited data. The approach yields state-of-the-art results on four real-world datasets, with extensive ablations confirming the contributions of online clustering, distillation, and adversarial learning. The work demonstrates practical scalability and positive online impact for large-scale recommendation systems.

Abstract

Sequential recommendation methods play a pivotal role in modern recommendation systems. A key challenge lies in accurately modeling user preferences in the face of data sparsity. To tackle this challenge, recent methods leverage contrastive learning (CL) to derive self-supervision signals by maximizing the mutual information of two augmented views of the original user behavior sequence. Despite their effectiveness, CL-based methods encounter a limitation in fully exploiting self-supervision signals for users with limited behavior data, as users with extensive behaviors naturally offer more information. To address this problem, we introduce a novel learning paradigm, named Online Self-Supervised Self-distillation for Sequential Recommendation ($S^4$Rec), effectively bridging the gap between self-supervised learning and self-distillation methods. Specifically, we employ online clustering to proficiently group users by their distinct latent intents. Additionally, an adversarial learning strategy is utilized to ensure that the clustering procedure is not affected by the behavior length factor. Subsequently, we employ self-distillation to facilitate the transfer of knowledge from users with extensive behaviors (teachers) to users with limited behaviors (students). Experiments conducted on four real-world datasets validate the effectiveness of the proposed method.

Leave No One Behind: Online Self-Supervised Self-Distillation for Sequential Recommendation

TL;DR

The paper tackles data sparsity in sequential recommendation by marrying self-supervised learning with self-distillation. It introduces S^4Rec, which uses online clustering to group users by latent intents, an adversarial head-tail mechanism to neutralize sequence-length bias, and cluster-aware self-distillation to transfer knowledge from users with long histories to those with limited data. The approach yields state-of-the-art results on four real-world datasets, with extensive ablations confirming the contributions of online clustering, distillation, and adversarial learning. The work demonstrates practical scalability and positive online impact for large-scale recommendation systems.

Abstract

Sequential recommendation methods play a pivotal role in modern recommendation systems. A key challenge lies in accurately modeling user preferences in the face of data sparsity. To tackle this challenge, recent methods leverage contrastive learning (CL) to derive self-supervision signals by maximizing the mutual information of two augmented views of the original user behavior sequence. Despite their effectiveness, CL-based methods encounter a limitation in fully exploiting self-supervision signals for users with limited behavior data, as users with extensive behaviors naturally offer more information. To address this problem, we introduce a novel learning paradigm, named Online Self-Supervised Self-distillation for Sequential Recommendation (Rec), effectively bridging the gap between self-supervised learning and self-distillation methods. Specifically, we employ online clustering to proficiently group users by their distinct latent intents. Additionally, an adversarial learning strategy is utilized to ensure that the clustering procedure is not affected by the behavior length factor. Subsequently, we employ self-distillation to facilitate the transfer of knowledge from users with extensive behaviors (teachers) to users with limited behaviors (students). Experiments conducted on four real-world datasets validate the effectiveness of the proposed method.
Paper Structure (28 sections, 6 equations, 1 figure, 5 tables)

This paper contains 28 sections, 6 equations, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Visualization of clustering for sequence granularity and cluster granularity on an amazon dataset. The horizontal and vertical axes of Fig.1(a) represent the two-dimensional spatial coordinates of the user sequence embedding vector using the t-SNE dimensionality reduction technique.