Table of Contents
Fetching ...

SimGR: Escaping the Pitfalls of Generative Decoding in LLM-based Recommendation

Yuanbo Zhao, Ruochen Liu, Senzhang Wang, Jun Yin, Yuxin Dong, Huan Gong, Hao Chen, Shirui Pan, Chengqi Zhang

TL;DR

This paper tackles biases in LLM-based generative recommender systems that arise when item-level preferences are inferred from token-level generation. It shows that autoregressive decoding can under-cover true items due to beam pruning, while parallel decoding induces distribution shifts from token-independence assumptions. To address this, the authors propose SimGR, which directly models item-level distributions in a shared latent space and ranks items by similarity, bypassing token generation entirely. Empirical results across three real-world datasets and multiple LLM backbones demonstrate that SimGR not only improves ranking metrics like NDCG but also enhances diversity and item coverage, with robust scaling as backbone capacity grows. Overall, SimGR offers a principled, scalable, and distributionally faithful alternative to token-level generative rec systems, with practical implications for more reliable and diverse recommendations.

Abstract

A core objective in recommender systems is to accurately model the distribution of user preferences over items to enable personalized recommendations. Recently, driven by the strong generative capabilities of large language models (LLMs), LLM-based generative recommendation has become increasingly popular. However, we observe that existing methods inevitably introduce systematic bias when estimating item-level preference distributions. Specifically, autoregressive generation suffers from incomplete coverage due to beam search pruning, while parallel generation distorts probabilities by assuming token independence. We attribute this issue to a fundamental modeling mismatch: these methods approximate item-level distributions via token-level generation, which inherently induces approximation errors. Through both theoretical analysis and empirical validation, we demonstrate that token-level generation cannot faithfully substitute item-level generation, leading to biased item distributions. To address this, we propose \textbf{Sim}ply \textbf{G}enerative \textbf{R}ecommendation (\textbf{SimGR}), a framework that directly models item-level preference distributions in a shared latent space and ranks items by similarity, thereby aligning the modeling objective with recommendation and mitigating distributional distortion. Extensive experiments across multiple datasets and LLM backbones show that SimGR consistently outperforms existing generative recommenders. Our code is available at https://anonymous.4open.science/r/SimGR-C408/

SimGR: Escaping the Pitfalls of Generative Decoding in LLM-based Recommendation

TL;DR

This paper tackles biases in LLM-based generative recommender systems that arise when item-level preferences are inferred from token-level generation. It shows that autoregressive decoding can under-cover true items due to beam pruning, while parallel decoding induces distribution shifts from token-independence assumptions. To address this, the authors propose SimGR, which directly models item-level distributions in a shared latent space and ranks items by similarity, bypassing token generation entirely. Empirical results across three real-world datasets and multiple LLM backbones demonstrate that SimGR not only improves ranking metrics like NDCG but also enhances diversity and item coverage, with robust scaling as backbone capacity grows. Overall, SimGR offers a principled, scalable, and distributionally faithful alternative to token-level generative rec systems, with practical implications for more reliable and diverse recommendations.

Abstract

A core objective in recommender systems is to accurately model the distribution of user preferences over items to enable personalized recommendations. Recently, driven by the strong generative capabilities of large language models (LLMs), LLM-based generative recommendation has become increasingly popular. However, we observe that existing methods inevitably introduce systematic bias when estimating item-level preference distributions. Specifically, autoregressive generation suffers from incomplete coverage due to beam search pruning, while parallel generation distorts probabilities by assuming token independence. We attribute this issue to a fundamental modeling mismatch: these methods approximate item-level distributions via token-level generation, which inherently induces approximation errors. Through both theoretical analysis and empirical validation, we demonstrate that token-level generation cannot faithfully substitute item-level generation, leading to biased item distributions. To address this, we propose \textbf{Sim}ply \textbf{G}enerative \textbf{R}ecommendation (\textbf{SimGR}), a framework that directly models item-level preference distributions in a shared latent space and ranks items by similarity, thereby aligning the modeling objective with recommendation and mitigating distributional distortion. Extensive experiments across multiple datasets and LLM backbones show that SimGR consistently outperforms existing generative recommenders. Our code is available at https://anonymous.4open.science/r/SimGR-C408/
Paper Structure (33 sections, 2 theorems, 15 equations, 4 figures, 4 tables)

This paper contains 33 sections, 2 theorems, 15 equations, 4 figures, 4 tables.

Key Result

theorem 1

Consider an autoregressive generative recommender system. The expected top-$K$ overlap $O_{K}^{(B)}$ between the list generated by Beam Search (with beam size $B$) and the global optimal list satisfies the following upper bound:

Figures (4)

  • Figure 1: An example of issues introduced by generating semantic IDs.(a) In autoregressive generation, the prediction set is strictly constrained by beam size, hence hindering the majority of items seen by the model. (b) In parallel generation, multiplying the results of different output heads wrongfully decreases the possibility of the target to be recommended.
  • Figure 2: The framework of SimGR
  • Figure 3: Performance comparison of SimGR and LC-Rec with scaling LLM backbones
  • Figure 4: Performance comparison of Entropy@$K$ with various $K$ on the "Instruments" dataset

Theorems & Definitions (2)

  • theorem 1
  • theorem 2