Table of Contents
Fetching ...

Diffusion Generative Recommendation with Continuous Tokens

Haohao Qu, Shanru Lin, Yujuan Ding, Yiqi Wang, Wenqi Fan

TL;DR

This paper tackles the limitations of quantized, discrete tokenization in LLM-based recommender systems by introducing ContRec, which uses a sigma-VAE tokenizer to create continuous user/item tokens and a Dispersive Diffusion module to model implicit user preferences conditioned on LLM reasoning. The approach combines autoregressive LLM outputs with diffusion-generated latent representations through a hybrid retrieval mechanism, optimized with a joint objective that includes a dispersive regularization term. Empirical results on four benchmarks show ContRec consistently surpassing both traditional and SOTA discrete-token baselines, with notable gains over TIGER and other peers, highlighting the value of continuous tokenization and diffusion in recommendations. The work suggests that continuous tokens and diffusion-based generation can improve reconstruction, generalization, and top-K ranking in generative RecSys, particularly in personalized, conversational settings.

Abstract

Recent advances in generative artificial intelligence, particularly large language models (LLMs), have opened new opportunities for enhancing recommender systems (RecSys). Most existing LLM-based RecSys approaches operate in a discrete space, using vector-quantized tokenizers to align with the inherent discrete nature of language models. However, these quantization methods often result in lossy tokenization and suboptimal learning, primarily due to inaccurate gradient propagation caused by the non-differentiable argmin operation in standard vector quantization. Inspired by the emerging trend of embracing continuous tokens in language models, we propose ContRec, a novel framework that seamlessly integrates continuous tokens into LLM-based RecSys. Specifically, ContRec consists of two key modules: a sigma-VAE Tokenizer, which encodes users/items with continuous tokens; and a Dispersive Diffusion module, which captures implicit user preference. The tokenizer is trained with a continuous Variational Auto-Encoder (VAE) objective, where three effective techniques are adopted to avoid representation collapse. By conditioning on the previously generated tokens of the LLM backbone during user modeling, the Dispersive Diffusion module performs a conditional diffusion process with a novel Dispersive Loss, enabling high-quality user preference generation through next-token diffusion. Finally, ContRec leverages both the textual reasoning output from the LLM and the latent representations produced by the diffusion model for Top-K item retrieval, thereby delivering comprehensive recommendation results. Extensive experiments on four datasets demonstrate that ContRec consistently outperforms both traditional and SOTA LLM-based recommender systems. Our results highlight the potential of continuous tokenization and generative modeling for advancing the next generation of recommender systems.

Diffusion Generative Recommendation with Continuous Tokens

TL;DR

This paper tackles the limitations of quantized, discrete tokenization in LLM-based recommender systems by introducing ContRec, which uses a sigma-VAE tokenizer to create continuous user/item tokens and a Dispersive Diffusion module to model implicit user preferences conditioned on LLM reasoning. The approach combines autoregressive LLM outputs with diffusion-generated latent representations through a hybrid retrieval mechanism, optimized with a joint objective that includes a dispersive regularization term. Empirical results on four benchmarks show ContRec consistently surpassing both traditional and SOTA discrete-token baselines, with notable gains over TIGER and other peers, highlighting the value of continuous tokenization and diffusion in recommendations. The work suggests that continuous tokens and diffusion-based generation can improve reconstruction, generalization, and top-K ranking in generative RecSys, particularly in personalized, conversational settings.

Abstract

Recent advances in generative artificial intelligence, particularly large language models (LLMs), have opened new opportunities for enhancing recommender systems (RecSys). Most existing LLM-based RecSys approaches operate in a discrete space, using vector-quantized tokenizers to align with the inherent discrete nature of language models. However, these quantization methods often result in lossy tokenization and suboptimal learning, primarily due to inaccurate gradient propagation caused by the non-differentiable argmin operation in standard vector quantization. Inspired by the emerging trend of embracing continuous tokens in language models, we propose ContRec, a novel framework that seamlessly integrates continuous tokens into LLM-based RecSys. Specifically, ContRec consists of two key modules: a sigma-VAE Tokenizer, which encodes users/items with continuous tokens; and a Dispersive Diffusion module, which captures implicit user preference. The tokenizer is trained with a continuous Variational Auto-Encoder (VAE) objective, where three effective techniques are adopted to avoid representation collapse. By conditioning on the previously generated tokens of the LLM backbone during user modeling, the Dispersive Diffusion module performs a conditional diffusion process with a novel Dispersive Loss, enabling high-quality user preference generation through next-token diffusion. Finally, ContRec leverages both the textual reasoning output from the LLM and the latent representations produced by the diffusion model for Top-K item retrieval, thereby delivering comprehensive recommendation results. Extensive experiments on four datasets demonstrate that ContRec consistently outperforms both traditional and SOTA LLM-based recommender systems. Our results highlight the potential of continuous tokenization and generative modeling for advancing the next generation of recommender systems.

Paper Structure

This paper contains 35 sections, 18 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison of five representative deep generative models, namely VQ-VAE van2017neural, RQ-VAE rajput2023recommender, MQ-VAE qu2024tokenrec, VAE kingma2014auto, and Diffusion ho2020denoising, in reconstructing item embeddings on the Beauty dataset. The standard diffusion model with continuous embeddings demonstrates superior reconstruction performance and loss convergence compared to the vanilla VAE and its discrete counterparts.
  • Figure 2: Overview of the proposed ContRec. ContRec represents users&items as latent vector representations using a not-quantized tokenizer and leverages the exceptional continuous-valued generation capability of diffusion models to operate within continuous spaces and generate implicit user preferences conditioned on the reasoning content of LLMs.
  • Figure 3: Optimization robustness on the Beauty and ML1M datasets.
  • Figure 4: Hyper-parameter tuning on the Beauty and ML1M datasets.