Table of Contents
Fetching ...

Unlocking the Power of Diffusion Models in Sequential Recommendation: A Simple and Effective Approach

Jialei Chen, Yuanbo Xu, Yiheng Jiang

TL;DR

This work tackles embedding collapse in diffusion-based sequential recommendation by introducing ADRec, which combines token-level diffusion with auto-regressive sequence modeling in a Transformer backbone. ADRec employs a causal attention module (CAM), a feature aggregation stage, and an auto-regressive diffusion module (ADM) to learn both sequence dynamics and per-token item distributions, while a three-stage training strategy prevents collapse and a last-token inference strategy preserves history. Empirical results across six datasets show ADRec achieving substantial gains in ranking accuracy (e.g., HR@20 and NDCG@20) and significantly reduced training time compared to prior diffusion baselines. Overall, ADRec demonstrates that token-level diffusion, when integrated with structured embedding pre-training and controlled inference, can unlock effective diffusion-based sequential recommendations with practical efficiency.

Abstract

In this paper, we focus on the often-overlooked issue of embedding collapse in existing diffusion-based sequential recommendation models and propose ADRec, an innovative framework designed to mitigate this problem. Diverging from previous diffusion-based methods, ADRec applies an independent noise process to each token and performs diffusion across the entire target sequence during training. ADRec captures token interdependency through auto-regression while modeling per-token distributions through token-level diffusion. This dual approach enables the model to effectively capture both sequence dynamics and item representations, overcoming the limitations of existing methods. To further mitigate embedding collapse, we propose a three-stage training strategy: (1) pre-training the embedding weights, (2) aligning these weights with the ADRec backbone, and (3) fine-tuning the model. During inference, ADRec applies the denoising process only to the last token, ensuring that the meaningful patterns in historical interactions are preserved. Our comprehensive empirical evaluation across six datasets underscores the effectiveness of ADRec in enhancing both the accuracy and efficiency of diffusion-based sequential recommendation systems.

Unlocking the Power of Diffusion Models in Sequential Recommendation: A Simple and Effective Approach

TL;DR

This work tackles embedding collapse in diffusion-based sequential recommendation by introducing ADRec, which combines token-level diffusion with auto-regressive sequence modeling in a Transformer backbone. ADRec employs a causal attention module (CAM), a feature aggregation stage, and an auto-regressive diffusion module (ADM) to learn both sequence dynamics and per-token item distributions, while a three-stage training strategy prevents collapse and a last-token inference strategy preserves history. Empirical results across six datasets show ADRec achieving substantial gains in ranking accuracy (e.g., HR@20 and NDCG@20) and significantly reduced training time compared to prior diffusion baselines. Overall, ADRec demonstrates that token-level diffusion, when integrated with structured embedding pre-training and controlled inference, can unlock effective diffusion-based sequential recommendations with practical efficiency.

Abstract

In this paper, we focus on the often-overlooked issue of embedding collapse in existing diffusion-based sequential recommendation models and propose ADRec, an innovative framework designed to mitigate this problem. Diverging from previous diffusion-based methods, ADRec applies an independent noise process to each token and performs diffusion across the entire target sequence during training. ADRec captures token interdependency through auto-regression while modeling per-token distributions through token-level diffusion. This dual approach enables the model to effectively capture both sequence dynamics and item representations, overcoming the limitations of existing methods. To further mitigate embedding collapse, we propose a three-stage training strategy: (1) pre-training the embedding weights, (2) aligning these weights with the ADRec backbone, and (3) fine-tuning the model. During inference, ADRec applies the denoising process only to the last token, ensuring that the meaningful patterns in historical interactions are preserved. Our comprehensive empirical evaluation across six datasets underscores the effectiveness of ADRec in enhancing both the accuracy and efficiency of diffusion-based sequential recommendation systems.

Paper Structure

This paper contains 36 sections, 13 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: T-SNE results of the learned item embeddings of ADRec and other baselines on the Baby and Beauty dataset. If the contour shape closely resembles isotropic Gaussian noise (as seen in DreamRec and DiffuRec) or if the representation space is narrow (as observed in DreamRec, DiffuRec, and SASRec+, which requires a large magnification factor), it suggests a weak embedding space. In contrast, ADRec maintains a structured embedding space and expands it compared to SASRec+, significantly enhancing item separability. Additional visualization results can be found in Appendix Figure \ref{['fig:tnse_full']}.
  • Figure 2: Method overview. The left diagram illustrates that time series and diffusion processes are two orthogonal directions of evolution, with noise acting as a soft mask to measure uncertainty. ADRec applies independent diffusion processes to individual items, with the noise level of the items in the current target sequence highlighted in green box.
  • Figure 3: The three-stage training strategy and the inference strategy of ADRec.
  • Figure 4: Comparison of training time between ADRec and other methods. "-xx%" indicates the reduction in training time for ADRec compared to DiffuRec.
  • Figure 5: Comparison of ADRec with linear integration using different $\lambda$ coefficients and cross-attention integration, with SASRec+ as a baseline.
  • ...and 5 more figures