Table of Contents
Fetching ...

Contrastive Learning for Sequential Recommendation

Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Bolin Ding, Bin Cui

TL;DR

<3-5 sentence high-level summary> CL4SRec addresses data sparsity and evolving user interests in sequential recommendation by integrating a sequence-level contrastive learning objective with the standard next-item prediction. The framework uses a Transformer-based encoder to learn robust two-view representations of user sequences generated by three augmentation operators (crop, mask, reorder) and optimizes a multi-task loss that combines supervised and contrastive signals. Empirical results on four public datasets show state-of-the-art performance across sparse and dense regimes, with ablations demonstrating the effectiveness of the SSL component and the augmentation strategies. The work also demonstrates that CL4SRec learns more coherent user representations, validating its impact on practical recommendation quality.

Abstract

Sequential recommendation methods play a crucial role in modern recommender systems because of their ability to capture a user's dynamic interest from her/his historical interactions. Despite their success, we argue that these approaches usually rely on the sequential prediction task to optimize the huge amounts of parameters. They usually suffer from the data sparsity problem, which makes it difficult for them to learn high-quality user representations. To tackle that, inspired by recent advances of contrastive learning techniques in the computer version, we propose a novel multi-task model called \textbf{C}ontrastive \textbf{L}earning for \textbf{S}equential \textbf{Rec}ommendation~(\textbf{CL4SRec}). CL4SRec not only takes advantage of the traditional next item prediction task but also utilizes the contrastive learning framework to derive self-supervision signals from the original user behavior sequences. Therefore, it can extract more meaningful user patterns and further encode the user representation effectively. In addition, we propose three data augmentation approaches to construct self-supervision signals. Extensive experiments on four public datasets demonstrate that CL4SRec achieves state-of-the-art performance over existing baselines by inferring better user representations.

Contrastive Learning for Sequential Recommendation

TL;DR

<3-5 sentence high-level summary> CL4SRec addresses data sparsity and evolving user interests in sequential recommendation by integrating a sequence-level contrastive learning objective with the standard next-item prediction. The framework uses a Transformer-based encoder to learn robust two-view representations of user sequences generated by three augmentation operators (crop, mask, reorder) and optimizes a multi-task loss that combines supervised and contrastive signals. Empirical results on four public datasets show state-of-the-art performance across sparse and dense regimes, with ablations demonstrating the effectiveness of the SSL component and the augmentation strategies. The work also demonstrates that CL4SRec learns more coherent user representations, validating its impact on practical recommendation quality.

Abstract

Sequential recommendation methods play a crucial role in modern recommender systems because of their ability to capture a user's dynamic interest from her/his historical interactions. Despite their success, we argue that these approaches usually rely on the sequential prediction task to optimize the huge amounts of parameters. They usually suffer from the data sparsity problem, which makes it difficult for them to learn high-quality user representations. To tackle that, inspired by recent advances of contrastive learning techniques in the computer version, we propose a novel multi-task model called \textbf{C}ontrastive \textbf{L}earning for \textbf{S}equential \textbf{Rec}ommendation~(\textbf{CL4SRec}). CL4SRec not only takes advantage of the traditional next item prediction task but also utilizes the contrastive learning framework to derive self-supervision signals from the original user behavior sequences. Therefore, it can extract more meaningful user patterns and further encode the user representation effectively. In addition, we propose three data augmentation approaches to construct self-supervision signals. Extensive experiments on four public datasets demonstrate that CL4SRec achieves state-of-the-art performance over existing baselines by inferring better user representations.

Paper Structure

This paper contains 33 sections, 15 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: A simple framework for CL4SRec. Two data augmentation methods, $a_i$ and $a_j$, are sampled from the same augmentation set $\mathcal{A}$. They are applied to each user's sequence and then we can obtain two correlated views of each sequence. A shared embedding layer and the user representation model $f(\cdot)$ transform the original and augmented sequences to the latent space where the contrastive loss and recommendation loss are applied.
  • Figure 2: A brief illustration of augmentation operations applied in our CL4SRec model, including item crop, item mask, and item reorder.
  • Figure 3: A brief architecture of SASRec model and Transformer Encoder Layer.
  • Figure 4:
  • Figure 5: Performance comparison on CL4SRec w.r.t. different $\lambda$. The dash line is the performance of SASRec.
  • ...and 1 more figures