Unlocking the Power of Diffusion Models in Sequential Recommendation: A Simple and Effective Approach

Jialei Chen; Yuanbo Xu; Yiheng Jiang

Unlocking the Power of Diffusion Models in Sequential Recommendation: A Simple and Effective Approach

Jialei Chen, Yuanbo Xu, Yiheng Jiang

TL;DR

This work tackles embedding collapse in diffusion-based sequential recommendation by introducing ADRec, which combines token-level diffusion with auto-regressive sequence modeling in a Transformer backbone. ADRec employs a causal attention module (CAM), a feature aggregation stage, and an auto-regressive diffusion module (ADM) to learn both sequence dynamics and per-token item distributions, while a three-stage training strategy prevents collapse and a last-token inference strategy preserves history. Empirical results across six datasets show ADRec achieving substantial gains in ranking accuracy (e.g., HR@20 and NDCG@20) and significantly reduced training time compared to prior diffusion baselines. Overall, ADRec demonstrates that token-level diffusion, when integrated with structured embedding pre-training and controlled inference, can unlock effective diffusion-based sequential recommendations with practical efficiency.

Abstract

In this paper, we focus on the often-overlooked issue of embedding collapse in existing diffusion-based sequential recommendation models and propose ADRec, an innovative framework designed to mitigate this problem. Diverging from previous diffusion-based methods, ADRec applies an independent noise process to each token and performs diffusion across the entire target sequence during training. ADRec captures token interdependency through auto-regression while modeling per-token distributions through token-level diffusion. This dual approach enables the model to effectively capture both sequence dynamics and item representations, overcoming the limitations of existing methods. To further mitigate embedding collapse, we propose a three-stage training strategy: (1) pre-training the embedding weights, (2) aligning these weights with the ADRec backbone, and (3) fine-tuning the model. During inference, ADRec applies the denoising process only to the last token, ensuring that the meaningful patterns in historical interactions are preserved. Our comprehensive empirical evaluation across six datasets underscores the effectiveness of ADRec in enhancing both the accuracy and efficiency of diffusion-based sequential recommendation systems.

Unlocking the Power of Diffusion Models in Sequential Recommendation: A Simple and Effective Approach

TL;DR

Abstract

Unlocking the Power of Diffusion Models in Sequential Recommendation: A Simple and Effective Approach

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)