Table of Contents
Fetching ...

DimeRec: A Unified Framework for Enhanced Sequential Recommendation via Generative Diffusion Models

Wuchao Li, Rui Huang, Haijun Zhao, Chi Liu, Kai Zheng, Qi Liu, Na Mou, Guorui Zhou, Defu Lian, Yang Song, Wentian Bao, Enyun Yu, Wenwu Ou

TL;DR

DimeRec reframes sequential recommendation by generating the next user interest rather than the next item, using a stationary-guidance module to extract stable signals from non-stationary histories and a diffusion-based aggregator to reconstruct recommendations. The approach introduces a Geodesic Random Walk on a spherical embedding space to align diffusion and recommendation objectives, supported by a guidance loss that stabilizes representation learning. Empirical results on three public datasets and a large-scale online deployment demonstrate substantial improvements in retrieval metrics and diversity, complemented by ablations and sensitivity analyses that validate the necessity of each component. The work delivers a practical, scalable framework that narrows the gap between discriminative SR methods and generative diffusion models, with significant potential for real-world recommender systems.

Abstract

Sequential Recommendation (SR) plays a pivotal role in recommender systems by tailoring recommendations to user preferences based on their non-stationary historical interactions. Achieving high-quality performance in SR requires attention to both item representation and diversity. However, designing an SR method that simultaneously optimizes these merits remains a long-standing challenge. In this study, we address this issue by integrating recent generative Diffusion Models (DM) into SR. DM has demonstrated utility in representation learning and diverse image generation. Nevertheless, a straightforward combination of SR and DM leads to sub-optimal performance due to discrepancies in learning objectives (recommendation vs. noise reconstruction) and the respective learning spaces (non-stationary vs. stationary). To overcome this, we propose a novel framework called DimeRec (\textbf{Di}ffusion with \textbf{m}ulti-interest \textbf{e}nhanced \textbf{Rec}ommender). DimeRec synergistically combines a guidance extraction module (GEM) and a generative diffusion aggregation module (DAM). The GEM extracts crucial stationary guidance signals from the user's non-stationary interaction history, while the DAM employs a generative diffusion process conditioned on GEM's outputs to reconstruct and generate consistent recommendations. Our numerical experiments demonstrate that DimeRec significantly outperforms established baseline methods across three publicly available datasets. Furthermore, we have successfully deployed DimeRec on a large-scale short video recommendation platform, serving hundreds of millions of users. Live A/B testing confirms that our method improves both users' time spent and result diversification.

DimeRec: A Unified Framework for Enhanced Sequential Recommendation via Generative Diffusion Models

TL;DR

DimeRec reframes sequential recommendation by generating the next user interest rather than the next item, using a stationary-guidance module to extract stable signals from non-stationary histories and a diffusion-based aggregator to reconstruct recommendations. The approach introduces a Geodesic Random Walk on a spherical embedding space to align diffusion and recommendation objectives, supported by a guidance loss that stabilizes representation learning. Empirical results on three public datasets and a large-scale online deployment demonstrate substantial improvements in retrieval metrics and diversity, complemented by ablations and sensitivity analyses that validate the necessity of each component. The work delivers a practical, scalable framework that narrows the gap between discriminative SR methods and generative diffusion models, with significant potential for real-world recommender systems.

Abstract

Sequential Recommendation (SR) plays a pivotal role in recommender systems by tailoring recommendations to user preferences based on their non-stationary historical interactions. Achieving high-quality performance in SR requires attention to both item representation and diversity. However, designing an SR method that simultaneously optimizes these merits remains a long-standing challenge. In this study, we address this issue by integrating recent generative Diffusion Models (DM) into SR. DM has demonstrated utility in representation learning and diverse image generation. Nevertheless, a straightforward combination of SR and DM leads to sub-optimal performance due to discrepancies in learning objectives (recommendation vs. noise reconstruction) and the respective learning spaces (non-stationary vs. stationary). To overcome this, we propose a novel framework called DimeRec (\textbf{Di}ffusion with \textbf{m}ulti-interest \textbf{e}nhanced \textbf{Rec}ommender). DimeRec synergistically combines a guidance extraction module (GEM) and a generative diffusion aggregation module (DAM). The GEM extracts crucial stationary guidance signals from the user's non-stationary interaction history, while the DAM employs a generative diffusion process conditioned on GEM's outputs to reconstruct and generate consistent recommendations. Our numerical experiments demonstrate that DimeRec significantly outperforms established baseline methods across three publicly available datasets. Furthermore, we have successfully deployed DimeRec on a large-scale short video recommendation platform, serving hundreds of millions of users. Live A/B testing confirms that our method improves both users' time spent and result diversification.
Paper Structure (30 sections, 18 equations, 5 figures, 8 tables, 2 algorithms)

This paper contains 30 sections, 18 equations, 5 figures, 8 tables, 2 algorithms.

Figures (5)

  • Figure 1: Architecture of model-based DimeRec. In the left part, we use the Self-Attention backbone for guidance extraction from the raw behavior sequence. The right part shows the DAM structure, we employ Geodesic Random Walk on the sphere to add noise to the embedding of the target item. Under the guidance sequence of multi-interest, we restore the embedding to its original state using a Multi-Layer Perceptron (MLP). The two parts are trained jointly in a multi-task way.
  • Figure 2: Example of Optimization Inconsistency.
  • Figure 3: Ablation Study: The influence of ablating different components on Linear Probing. The two blue horizontal lines represent the results of DiffuRec and DreamRec, respectively.
  • Figure 4: The influence of GRW on the three losses during the training process. The blue line represents $\mathcal{L}_{gem}$, the orange line represents $\mathcal{L}_{ssm}$, and the red line represents $\mathcal{L}_{recon}$.
  • Figure 5: HR@50 calculated by using the intermediate embeddings from the denoising process on ML-10M and YooChoose Dataset. The results on KuaiRec show the same trend.