Diffusion-based Contrastive Learning for Sequential Recommendation
Ziqiang Cui, Haolun Wu, Bowei He, Ji Cheng, Chen Ma
TL;DR
CaDiRec tackles data sparsity and semantic drift in sequential recommendation by introducing a context-aware diffusion model that generates context-consistent augmented views for contrastive learning. The method jointly trains a Transformer-based SR model and a diffusion-based augmenter with shared item embeddings, optimizing a combined objective that includes the SR loss, a contrastive loss, and a diffusion loss. Empirical results on five benchmarks show CaDiRec achieving state-of-the-art performance, with ablations confirming the importance of context guidance, diffusion training, and contrastive learning. The approach offers practical benefits in producing realistic augmentations and robust representations across varying data sparsity levels.
Abstract
Contrastive learning has been effectively utilized to enhance the training of sequential recommendation models by leveraging informative self-supervised signals. Most existing approaches generate augmented views of the same user sequence through random augmentation and subsequently maximize their agreement in the representation space. However, these methods often neglect the rationality of the augmented samples. Due to significant uncertainty, random augmentation can disrupt the semantic information and interest evolution patterns inherent in the original user sequences. Moreover, pulling semantically inconsistent sequences closer in the representation space can render the user sequence embeddings insensitive to variations in user preferences, which contradicts the primary objective of sequential recommendation. To address these limitations, we propose the Context-aware Diffusion-based Contrastive Learning for Sequential Recommendation, named CaDiRec. The core idea is to leverage context information to generate more reasonable augmented views. Specifically, CaDiRec employs a context-aware diffusion model to generate alternative items for the given positions within a sequence. These generated items are aligned with their respective context information and can effectively replace the corresponding original items, thereby generating a positive view of the original sequence. By considering two different augmentations of the same user sequence, we can construct a pair of positive samples for contrastive learning. To ensure representation cohesion, we train the entire framework in an end-to-end manner, with shared item embeddings between the diffusion model and the recommendation model. Extensive experiments on five benchmark datasets demonstrate the advantages of our proposed method over existing baselines.
