Table of Contents
Fetching ...

Intent-aware Diffusion with Contrastive Learning for Sequential Recommendation

Yuanpeng Qu, Hajime Nobuhara

TL;DR

This work addresses data sparsity and noisy augmentations in sequential recommendation by introducing InDiRec, an intent-aware diffusion framework. It constructs intent-guided signals via dynamic prefix segmentation and clustering to learn $K$ latent intents, and then guides a conditional diffusion model with these intent signals to generate intent-aligned augmented views for contrastive learning. The model jointly optimizes reconstruction, contrastive alignment, and diffusion objectives, yielding robust representations and superior recommendation accuracy across five real-world datasets. Empirical results show consistent improvements over strong baselines, and analyses confirm the method’s robustness to sparsity and noisy data, as well as the importance of intent guidance in diffusion-based view generation. This approach advances SR by integrating explicit intent structure with controllable generative augmentation, offering a practical path toward more reliable, intent-preserving recommendations.

Abstract

Contrastive learning has proven effective in training sequential recommendation models by incorporating self-supervised signals from augmented views. Most existing methods generate multiple views from the same interaction sequence through stochastic data augmentation, aiming to align their representations in the embedding space. However, users typically have specific intents when purchasing items (e.g., buying clothes as gifts or cosmetics for beauty). Random data augmentation used in existing methods may introduce noise, disrupting the latent intent information implicit in the original interaction sequence. Moreover, using noisy augmented sequences in contrastive learning may mislead the model to focus on irrelevant features, distorting the embedding space and failing to capture users' true behavior patterns and intents. To address these issues, we propose Intent-aware Diffusion with contrastive learning for sequential Recommendation (InDiRec). The core idea is to generate item sequences aligned with users' purchasing intents, thus providing more reliable augmented views for contrastive learning. Specifically, InDiRec first performs intent clustering on sequence representations using K-means to build intent-guided signals. Next, it retrieves the intent representation of the target interaction sequence to guide a conditional diffusion model, generating positive views that share the same underlying intent. Finally, contrastive learning is applied to maximize representation consistency between these intent-aligned views and the original sequence. Extensive experiments on five public datasets demonstrate that InDiRec achieves superior performance compared to existing baselines, learning more robust representations even under noisy and sparse data conditions.

Intent-aware Diffusion with Contrastive Learning for Sequential Recommendation

TL;DR

This work addresses data sparsity and noisy augmentations in sequential recommendation by introducing InDiRec, an intent-aware diffusion framework. It constructs intent-guided signals via dynamic prefix segmentation and clustering to learn latent intents, and then guides a conditional diffusion model with these intent signals to generate intent-aligned augmented views for contrastive learning. The model jointly optimizes reconstruction, contrastive alignment, and diffusion objectives, yielding robust representations and superior recommendation accuracy across five real-world datasets. Empirical results show consistent improvements over strong baselines, and analyses confirm the method’s robustness to sparsity and noisy data, as well as the importance of intent guidance in diffusion-based view generation. This approach advances SR by integrating explicit intent structure with controllable generative augmentation, offering a practical path toward more reliable, intent-preserving recommendations.

Abstract

Contrastive learning has proven effective in training sequential recommendation models by incorporating self-supervised signals from augmented views. Most existing methods generate multiple views from the same interaction sequence through stochastic data augmentation, aiming to align their representations in the embedding space. However, users typically have specific intents when purchasing items (e.g., buying clothes as gifts or cosmetics for beauty). Random data augmentation used in existing methods may introduce noise, disrupting the latent intent information implicit in the original interaction sequence. Moreover, using noisy augmented sequences in contrastive learning may mislead the model to focus on irrelevant features, distorting the embedding space and failing to capture users' true behavior patterns and intents. To address these issues, we propose Intent-aware Diffusion with contrastive learning for sequential Recommendation (InDiRec). The core idea is to generate item sequences aligned with users' purchasing intents, thus providing more reliable augmented views for contrastive learning. Specifically, InDiRec first performs intent clustering on sequence representations using K-means to build intent-guided signals. Next, it retrieves the intent representation of the target interaction sequence to guide a conditional diffusion model, generating positive views that share the same underlying intent. Finally, contrastive learning is applied to maximize representation consistency between these intent-aligned views and the original sequence. Extensive experiments on five public datasets demonstrate that InDiRec achieves superior performance compared to existing baselines, learning more robust representations even under noisy and sparse data conditions.

Paper Structure

This paper contains 41 sections, 23 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: An example illustrates that random data augmentation may disrupt the semantic consistency of users' purchasing intents, affecting the model's understanding.
  • Figure 2: Overview of our InDiRec. InDiRec first performs Intent-guided Signal Construction on training sequences, where $c_x$ denotes the intent prototype. It then generates positive augmented views via Intent-aware Diffusion guided by $s_\mathbf{e}$. Finally, the views and the original sequence are encoded into $\mathbf{h}_1$ and $\mathbf{h}_2$, which are optimized through contrastive learning.
  • Figure 3: Performance comparison on different user groups.
  • Figure 4: Performance comparison across different noise ratios on four datasets.
  • Figure 5: Performance of InDiRec w.r.t. different hyperparameters on NDCG (ND@20), as shown in subfigures (a)-(f).
  • ...and 1 more figures