Table of Contents
Fetching ...

AsarRec: Adaptive Sequential Augmentation for Robust Self-supervised Sequential Recommendation

Kaike Zhang, Qi Cao, Fei Sun, Xinran Liu

TL;DR

AsarRec addresses the fragility of self-supervised sequential recommendations under noisy user behavior by learning adaptive augmentation strategies. It unifies augmentation operations into a constrained matrix framework and uses a differentiable Semi-Sinkhorn process to produce per-user transformation matrices, optimized for diversity, semantic invariance, and informativeness. Empirical results across three datasets and multiple backbones show state-of-the-art robustness and consistent gains over static augmentation baselines, including under synthetic noise. The approach demonstrates strong generalization and provides insight into how augmentations should adapt to data characteristics and noise levels in real-world applications.

Abstract

Sequential recommender systems have demonstrated strong capabilities in modeling users' dynamic preferences and capturing item transition patterns. However, real-world user behaviors are often noisy due to factors such as human errors, uncertainty, and behavioral ambiguity, which can lead to degraded recommendation performance. To address this issue, recent approaches widely adopt self-supervised learning (SSL), particularly contrastive learning, by generating perturbed views of user interaction sequences and maximizing their mutual information to improve model robustness. However, these methods heavily rely on their pre-defined static augmentation strategies~(where the augmentation type remains fixed once chosen) to construct augmented views, leading to two critical challenges: (1) the optimal augmentation type can vary significantly across different scenarios; (2) inappropriate augmentations may even degrade recommendation performance, limiting the effectiveness of SSL. To overcome these limitations, we propose an adaptive augmentation framework. We first unify existing basic augmentation operations into a unified formulation via structured transformation matrices. Building on this, we introduce AsarRec (Adaptive Sequential Augmentation for Robust Sequential Recommendation), which learns to generate transformation matrices by encoding user sequences into probabilistic transition matrices and projecting them into hard semi-doubly stochastic matrices via a differentiable Semi-Sinkhorn algorithm. To ensure that the learned augmentations benefit downstream performance, we jointly optimize three objectives: diversity, semantic invariance, and informativeness. Extensive experiments on three benchmark datasets under varying noise levels validate the effectiveness of AsarRec, demonstrating its superior robustness and consistent improvements.

AsarRec: Adaptive Sequential Augmentation for Robust Self-supervised Sequential Recommendation

TL;DR

AsarRec addresses the fragility of self-supervised sequential recommendations under noisy user behavior by learning adaptive augmentation strategies. It unifies augmentation operations into a constrained matrix framework and uses a differentiable Semi-Sinkhorn process to produce per-user transformation matrices, optimized for diversity, semantic invariance, and informativeness. Empirical results across three datasets and multiple backbones show state-of-the-art robustness and consistent gains over static augmentation baselines, including under synthetic noise. The approach demonstrates strong generalization and provides insight into how augmentations should adapt to data characteristics and noise levels in real-world applications.

Abstract

Sequential recommender systems have demonstrated strong capabilities in modeling users' dynamic preferences and capturing item transition patterns. However, real-world user behaviors are often noisy due to factors such as human errors, uncertainty, and behavioral ambiguity, which can lead to degraded recommendation performance. To address this issue, recent approaches widely adopt self-supervised learning (SSL), particularly contrastive learning, by generating perturbed views of user interaction sequences and maximizing their mutual information to improve model robustness. However, these methods heavily rely on their pre-defined static augmentation strategies~(where the augmentation type remains fixed once chosen) to construct augmented views, leading to two critical challenges: (1) the optimal augmentation type can vary significantly across different scenarios; (2) inappropriate augmentations may even degrade recommendation performance, limiting the effectiveness of SSL. To overcome these limitations, we propose an adaptive augmentation framework. We first unify existing basic augmentation operations into a unified formulation via structured transformation matrices. Building on this, we introduce AsarRec (Adaptive Sequential Augmentation for Robust Sequential Recommendation), which learns to generate transformation matrices by encoding user sequences into probabilistic transition matrices and projecting them into hard semi-doubly stochastic matrices via a differentiable Semi-Sinkhorn algorithm. To ensure that the learned augmentations benefit downstream performance, we jointly optimize three objectives: diversity, semantic invariance, and informativeness. Extensive experiments on three benchmark datasets under varying noise levels validate the effectiveness of AsarRec, demonstrating its superior robustness and consistent improvements.

Paper Structure

This paper contains 34 sections, 12 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Effectiveness of various augmentation methods and their combinations across different data scenarios. In the noisy setting, an additional 20% random noise is injected into each user's interaction sequence.
  • Figure 2: An overview of our proposed framework. The top illustrates five commonly used heuristic augmentation strategies. The middle part shows how we unify these operations as structured transformation matrices. The bottom demonstrates how our model learns to generate effective transformation matrices through a differentiable Sinkhorn-based process, guided by three key objectives: diversity, semantic invariance, and informativeness. Our method enables a transition from static, heuristic, or single-type augmentations to adaptive, learnable, and composable augmentation strategies.
  • Figure 3: Recommendation performance of different argumentation methods across various noise ratios.
  • Figure 4: Comparison of the original padded sequence $s_u^*$ and the transformed sequence $s_u'$ for a sampled user from the Games dataset in clean and noisy (20% noise) settings.
  • Figure 5: Hyperparameter analysis and ablation study.
  • ...and 2 more figures