SimGR: Escaping the Pitfalls of Generative Decoding in LLM-based Recommendation

Yuanbo Zhao; Ruochen Liu; Senzhang Wang; Jun Yin; Yuxin Dong; Huan Gong; Hao Chen; Shirui Pan; Chengqi Zhang

SimGR: Escaping the Pitfalls of Generative Decoding in LLM-based Recommendation

Yuanbo Zhao, Ruochen Liu, Senzhang Wang, Jun Yin, Yuxin Dong, Huan Gong, Hao Chen, Shirui Pan, Chengqi Zhang

TL;DR

This paper tackles biases in LLM-based generative recommender systems that arise when item-level preferences are inferred from token-level generation. It shows that autoregressive decoding can under-cover true items due to beam pruning, while parallel decoding induces distribution shifts from token-independence assumptions. To address this, the authors propose SimGR, which directly models item-level distributions in a shared latent space and ranks items by similarity, bypassing token generation entirely. Empirical results across three real-world datasets and multiple LLM backbones demonstrate that SimGR not only improves ranking metrics like NDCG but also enhances diversity and item coverage, with robust scaling as backbone capacity grows. Overall, SimGR offers a principled, scalable, and distributionally faithful alternative to token-level generative rec systems, with practical implications for more reliable and diverse recommendations.

Abstract

A core objective in recommender systems is to accurately model the distribution of user preferences over items to enable personalized recommendations. Recently, driven by the strong generative capabilities of large language models (LLMs), LLM-based generative recommendation has become increasingly popular. However, we observe that existing methods inevitably introduce systematic bias when estimating item-level preference distributions. Specifically, autoregressive generation suffers from incomplete coverage due to beam search pruning, while parallel generation distorts probabilities by assuming token independence. We attribute this issue to a fundamental modeling mismatch: these methods approximate item-level distributions via token-level generation, which inherently induces approximation errors. Through both theoretical analysis and empirical validation, we demonstrate that token-level generation cannot faithfully substitute item-level generation, leading to biased item distributions. To address this, we propose \textbf{Sim}ply \textbf{G}enerative \textbf{R}ecommendation (\textbf{SimGR}), a framework that directly models item-level preference distributions in a shared latent space and ranks items by similarity, thereby aligning the modeling objective with recommendation and mitigating distributional distortion. Extensive experiments across multiple datasets and LLM backbones show that SimGR consistently outperforms existing generative recommenders. Our code is available at https://anonymous.4open.science/r/SimGR-C408/

SimGR: Escaping the Pitfalls of Generative Decoding in LLM-based Recommendation

TL;DR

Abstract

Paper Structure (33 sections, 2 theorems, 15 equations, 4 figures, 4 tables)

This paper contains 33 sections, 2 theorems, 15 equations, 4 figures, 4 tables.

Introduction
Related Work
LLMs for Discriminative Recommendation
Generative Recommendations
Preliminary
Tokenization and Semantic IDs
Autoregressive Generation for Recommendation
Parallel Generation for Recommendation
Analysis on Generating Semantic IDs
Analysis on Autoregressive Generation
Empirical Study
Theoretical Study
Notation and Setup
Theoretical Bounds
Numeric Estimation
...and 18 more sections

Key Result

theorem 1

Consider an autoregressive generative recommender system. The expected top-$K$ overlap $O_{K}^{(B)}$ between the list generated by Beam Search (with beam size $B$) and the global optimal list satisfies the following upper bound:

Figures (4)

Figure 1: An example of issues introduced by generating semantic IDs.(a) In autoregressive generation, the prediction set is strictly constrained by beam size, hence hindering the majority of items seen by the model. (b) In parallel generation, multiplying the results of different output heads wrongfully decreases the possibility of the target to be recommended.
Figure 2: The framework of SimGR
Figure 3: Performance comparison of SimGR and LC-Rec with scaling LLM backbones
Figure 4: Performance comparison of Entropy@$K$ with various $K$ on the "Instruments" dataset

Theorems & Definitions (2)

theorem 1
theorem 2

SimGR: Escaping the Pitfalls of Generative Decoding in LLM-based Recommendation

TL;DR

Abstract

SimGR: Escaping the Pitfalls of Generative Decoding in LLM-based Recommendation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (2)