Masked Diffusion for Generative Recommendation

Kulin Shah; Bhuvesh Kumar; Neil Shah; Liam Collins

Masked Diffusion for Generative Recommendation

Kulin Shah, Bhuvesh Kumar, Neil Shah, Liam Collins

TL;DR

This paper tackles the inefficiencies of autoregressive SID-based generative recommendations by introducing MaskGR, a discrete masked diffusion model over SID sequences. MaskGR enables parallel decoding of SID tokens, improves data efficiency, and better captures global relationships among items, yielding strong gains over AR and continuous-diffusion baselines with far fewer inference steps. The work also demonstrates that MaskGR can be extended with dense retrieval, achieving further gains and showing compatibility with existing AR-enhancements. Overall, MaskGR offers a simple, generalizable framework that improves performance, speed, and flexibility in generative recommendation with semantic IDs, with public code available for reproducibility.

Abstract

Generative recommendation (GR) with semantic IDs (SIDs) has emerged as a promising alternative to traditional recommendation approaches due to its performance gains, capitalization on semantic information provided through language model embeddings, and inference and storage efficiency. Existing GR with SIDs works frame the probability of a sequence of SIDs corresponding to a user's interaction history using autoregressive modeling. While this has led to impressive next item prediction performances in certain settings, these autoregressive GR with SIDs models suffer from expensive inference due to sequential token-wise decoding, potentially inefficient use of training data and bias towards learning short-context relationships among tokens. Inspired by recent breakthroughs in NLP, we propose to instead model and learn the probability of a user's sequence of SIDs using masked diffusion. Masked diffusion employs discrete masking noise to facilitate learning the sequence distribution, and models the probability of masked tokens as conditionally independent given the unmasked tokens, allowing for parallel decoding of the masked tokens. We demonstrate through thorough experiments that our proposed method consistently outperforms autoregressive modeling. This performance gap is especially pronounced in data-constrained settings and in terms of coarse-grained recall, consistent with our intuitions. Moreover, our approach allows the flexibility of predicting multiple SIDs in parallel during inference while maintaining superior performance to autoregressive modeling.

Masked Diffusion for Generative Recommendation

TL;DR

Abstract

Masked Diffusion for Generative Recommendation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)