Table of Contents
Fetching ...

LLaDA-Rec: Discrete Diffusion for Parallel Semantic ID Generation in Generative Recommendation

Teng Shi, Chenglei Shen, Weijie Yu, Shen Nie, Chongxuan Li, Xiao Zhang, Ming He, Yan Han, Jun Xu

TL;DR

LLaDA-Rec reframes generative recommendation as discrete diffusion to overcome unidirectional attention and error propagation in autoregressive models. It introduces parallel semantic IDs via Multi-Head VQ-VAE, dual diffusion masking to capture inter- and intra-item relationships, and an adapted beam-search-like inference to generate top-k items with an adaptive, confident-first order. Empirical results on three real-world datasets show state-of-the-art performance, with ablations validating the key components and analysis highlighting bidirectional attention and adaptive generation as the sources of improvement. The work positions discrete diffusion as a new paradigm for end-to-end generative recommendation with integrated generation and retrieval and practical gains for downstream systems.

Abstract

Generative recommendation represents each item as a semantic ID, i.e., a sequence of discrete tokens, and generates the next item through autoregressive decoding. While effective, existing autoregressive models face two intrinsic limitations: (1) unidirectional constraints, where causal attention restricts each token to attend only to its predecessors, hindering global semantic modeling; and (2) error accumulation, where the fixed left-to-right generation order causes prediction errors in early tokens to propagate to the predictions of subsequent token. To address these issues, we propose LLaDA-Rec, a discrete diffusion framework that reformulates recommendation as parallel semantic ID generation. By combining bidirectional attention with the adaptive generation order, the approach models inter-item and intra-item dependencies more effectively and alleviates error accumulation. Specifically, our approach comprises three key designs: (1) a parallel tokenization scheme that produces semantic IDs for bidirectional modeling, addressing the mismatch between residual quantization and bidirectional architectures; (2) two masking mechanisms at the user-history and next-item levels to capture both inter-item sequential dependencies and intra-item semantic relationships; and (3) an adapted beam search strategy for adaptive-order discrete diffusion decoding, resolving the incompatibility of standard beam search with diffusion-based generation. Experiments on three real-world datasets show that LLaDA-Rec consistently outperforms both ID-based and state-of-the-art generative recommenders, establishing discrete diffusion as a new paradigm for generative recommendation.

LLaDA-Rec: Discrete Diffusion for Parallel Semantic ID Generation in Generative Recommendation

TL;DR

LLaDA-Rec reframes generative recommendation as discrete diffusion to overcome unidirectional attention and error propagation in autoregressive models. It introduces parallel semantic IDs via Multi-Head VQ-VAE, dual diffusion masking to capture inter- and intra-item relationships, and an adapted beam-search-like inference to generate top-k items with an adaptive, confident-first order. Empirical results on three real-world datasets show state-of-the-art performance, with ablations validating the key components and analysis highlighting bidirectional attention and adaptive generation as the sources of improvement. The work positions discrete diffusion as a new paradigm for end-to-end generative recommendation with integrated generation and retrieval and practical gains for downstream systems.

Abstract

Generative recommendation represents each item as a semantic ID, i.e., a sequence of discrete tokens, and generates the next item through autoregressive decoding. While effective, existing autoregressive models face two intrinsic limitations: (1) unidirectional constraints, where causal attention restricts each token to attend only to its predecessors, hindering global semantic modeling; and (2) error accumulation, where the fixed left-to-right generation order causes prediction errors in early tokens to propagate to the predictions of subsequent token. To address these issues, we propose LLaDA-Rec, a discrete diffusion framework that reformulates recommendation as parallel semantic ID generation. By combining bidirectional attention with the adaptive generation order, the approach models inter-item and intra-item dependencies more effectively and alleviates error accumulation. Specifically, our approach comprises three key designs: (1) a parallel tokenization scheme that produces semantic IDs for bidirectional modeling, addressing the mismatch between residual quantization and bidirectional architectures; (2) two masking mechanisms at the user-history and next-item levels to capture both inter-item sequential dependencies and intra-item semantic relationships; and (3) an adapted beam search strategy for adaptive-order discrete diffusion decoding, resolving the incompatibility of standard beam search with diffusion-based generation. Experiments on three real-world datasets show that LLaDA-Rec consistently outperforms both ID-based and state-of-the-art generative recommenders, establishing discrete diffusion as a new paradigm for generative recommendation.

Paper Structure

This paper contains 40 sections, 17 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: An illustration of the advantages of discrete diffusion over autoregressive generation. In autoregressive models, an error in the first token will propagate to subsequent tokens. In contrast, a discrete diffusion model predicts all masked positions in parallel at each step, and re-masks and re-predicts low-confidence error tokens, ultimately producing more accurate results.
  • Figure 2: Overall framework of LLaDA-Rec, which consists of three main modules: (1) Parallel Tokenization, where Multi-Head VQ-VAE are used to produce parallel semantic IDs for each item; (2) Discrete Diffusion Training, which applies two masking strategies. The user-history level masking models inter-item sequential dependencies, and the next-item level masking captures intra-item semantic relationships; (3) Discrete Diffusion Inference, where beam search is adapted to discrete diffusion decoding to generate the final top-$k$ recommended items.
  • Figure 3: Comparison of different attention mechanisms. (a): Attention masks corresponding to each mechanism. (b) and (c): Performance under different attention mechanisms.
  • Figure 4: Performance under different generation orders.
  • Figure 5: Performance under different generation steps.
  • ...and 1 more figures