Table of Contents
Fetching ...

Generative Sequential Recommendation via Hierarchical Behavior Modeling

Zhefan Wang, Guokai Yan, Jinbei Yu, Siyu Gu, Jingyan Chen, Peng Jiang, Zhiqiang Guo, Min Zhang

TL;DR

A novel generative framework, GAMER (Generative Augmentation and Multi-lEvel behavior modeling for Recommendation), built upon a decoder-only backbone, which introduces a cross-level interaction layer to capture hierarchical dependencies among behaviors and a sequential augmentation strategy that enhances robustness in training.

Abstract

Recommender systems in multi-behavior domains, such as advertising and e-commerce, aim to guide users toward high-value but inherently sparse conversions. Leveraging auxiliary behaviors (e.g., clicks, likes, shares) is therefore essential. Recent progress on generative recommendations has brought new possibilities for multi-behavior sequential recommendation. However, existing generative approaches face two significant challenges: 1) Inadequate Sequence Modeling: capture the complex, cross-level dependencies within user behavior sequences, and 2) Lack of Suitable Datasets: publicly available multi-behavior recommendation datasets are almost exclusively derived from e-commerce platforms, limiting the validation of feasibility in other domains, while also lacking sufficient side information for semantic ID generation. To address these issues, we propose a novel generative framework, GAMER (Generative Augmentation and Multi-lEvel behavior modeling for Recommendation), built upon a decoder-only backbone. GAMER introduces a cross-level interaction layer to capture hierarchical dependencies among behaviors and a sequential augmentation strategy that enhances robustness in training. To further advance this direction, we collect and release ShortVideoAD, a large-scale multi-behavior dataset from a mainstream short-video platform, which differs fundamentally from existing e-commerce datasets and provides pretrained semantic IDs for research on generative methods. Extensive experiments show that GAMER consistently outperforms both discriminative and generative baselines across multiple metrics.

Generative Sequential Recommendation via Hierarchical Behavior Modeling

TL;DR

A novel generative framework, GAMER (Generative Augmentation and Multi-lEvel behavior modeling for Recommendation), built upon a decoder-only backbone, which introduces a cross-level interaction layer to capture hierarchical dependencies among behaviors and a sequential augmentation strategy that enhances robustness in training.

Abstract

Recommender systems in multi-behavior domains, such as advertising and e-commerce, aim to guide users toward high-value but inherently sparse conversions. Leveraging auxiliary behaviors (e.g., clicks, likes, shares) is therefore essential. Recent progress on generative recommendations has brought new possibilities for multi-behavior sequential recommendation. However, existing generative approaches face two significant challenges: 1) Inadequate Sequence Modeling: capture the complex, cross-level dependencies within user behavior sequences, and 2) Lack of Suitable Datasets: publicly available multi-behavior recommendation datasets are almost exclusively derived from e-commerce platforms, limiting the validation of feasibility in other domains, while also lacking sufficient side information for semantic ID generation. To address these issues, we propose a novel generative framework, GAMER (Generative Augmentation and Multi-lEvel behavior modeling for Recommendation), built upon a decoder-only backbone. GAMER introduces a cross-level interaction layer to capture hierarchical dependencies among behaviors and a sequential augmentation strategy that enhances robustness in training. To further advance this direction, we collect and release ShortVideoAD, a large-scale multi-behavior dataset from a mainstream short-video platform, which differs fundamentally from existing e-commerce datasets and provides pretrained semantic IDs for research on generative methods. Extensive experiments show that GAMER consistently outperforms both discriminative and generative baselines across multiple metrics.

Paper Structure

This paper contains 39 sections, 8 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: The overview of our proposed GAMER. The left illustration shows our multi-behavior sequential augmentation, which samples the original sequence with different dropout rates $r^t, t=1, \ldots, x$ to generate $x$-fold additional training samples. The right illustrates our Qwen3 MoE block, which consists of three modules: Causal Self-Attention Layer, Cross-level Behavior Interaction Layer, and Position-and-Behavior Aware MoE.
  • Figure 2: The ratio of items with buy behavior in the test set under the leave-one-out setting for Retail and Tmall. Figure \ref{['subfig:a']} and Figure \ref{['subfig:b']} show the original data. Figure \ref{['subfig:c']} and Figure \ref{['subfig:d']} filter the most recent interaction in the user history if it is a low-level behavior for the same target item. It is worth noting that the ratios of $k=5$ and $k=10$ in Figure \ref{['subfig:a']} are consistent with the HR@5 and HR@10 reported in MBGen liu2024multi.
  • Figure 3: The comparison of different augmentation times on both target behavior item prediction and behavior-specific item prediction tasks. We uniformly use the same SIDs for item tokenization to ensure fairness in comparison.
  • Figure 4: Robustness analysis on sequential augmentation times.
  • Figure 5: Session-wise Causal Self-Attention Layer.