Table of Contents
Fetching ...

IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation

Zijie Lin, Yang Zhang, Xiaoyan Zhao, Fengbin Zhu, Fuli Feng, Tat-Seng Chua

TL;DR

This work proposes an Information Gain-based Decisiveness-aware Token handling (IGD) strategy that integrates token decisiveness into both tuning and decoding, and demonstrates that IGD consistently improves recommendation accuracy, achieving significant gains on widely used ranking metrics compared to strong baselines.

Abstract

Large Language Models (LLMs) have shown strong potential for recommendation by framing item prediction as a token-by-token language generation task. However, existing methods treat all item tokens equally, simply pursuing likelihood maximization during both optimization and decoding. This overlooks crucial token-level differences in decisiveness-many tokens contribute little to item discrimination yet can dominate optimization or decoding. To quantify token decisiveness, we propose a novel perspective that models item generation as a decision process, measuring token decisiveness by the Information Gain (IG) each token provides in reducing uncertainty about the generated item. Our empirical analysis reveals that most tokens have low IG but often correspond to high logits, disproportionately influencing training loss and decoding, which may impair model performance. Building on these insights, we introduce an Information Gain-based Decisiveness-aware Token handling (IGD) strategy that integrates token decisiveness into both tuning and decoding. Specifically, IGD downweights low-IG tokens during tuning and rebalances decoding to emphasize tokens with high IG. In this way, IGD moves beyond pure likelihood maximization, effectively prioritizing high-decisiveness tokens. Extensive experiments on four benchmark datasets with two LLM backbones demonstrate that IGD consistently improves recommendation accuracy, achieving significant gains on widely used ranking metrics compared to strong baselines.

IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation

TL;DR

This work proposes an Information Gain-based Decisiveness-aware Token handling (IGD) strategy that integrates token decisiveness into both tuning and decoding, and demonstrates that IGD consistently improves recommendation accuracy, achieving significant gains on widely used ranking metrics compared to strong baselines.

Abstract

Large Language Models (LLMs) have shown strong potential for recommendation by framing item prediction as a token-by-token language generation task. However, existing methods treat all item tokens equally, simply pursuing likelihood maximization during both optimization and decoding. This overlooks crucial token-level differences in decisiveness-many tokens contribute little to item discrimination yet can dominate optimization or decoding. To quantify token decisiveness, we propose a novel perspective that models item generation as a decision process, measuring token decisiveness by the Information Gain (IG) each token provides in reducing uncertainty about the generated item. Our empirical analysis reveals that most tokens have low IG but often correspond to high logits, disproportionately influencing training loss and decoding, which may impair model performance. Building on these insights, we introduce an Information Gain-based Decisiveness-aware Token handling (IGD) strategy that integrates token decisiveness into both tuning and decoding. Specifically, IGD downweights low-IG tokens during tuning and rebalances decoding to emphasize tokens with high IG. In this way, IGD moves beyond pure likelihood maximization, effectively prioritizing high-decisiveness tokens. Extensive experiments on four benchmark datasets with two LLM backbones demonstrate that IGD consistently improves recommendation accuracy, achieving significant gains on widely used ranking metrics compared to strong baselines.

Paper Structure

This paper contains 35 sections, 10 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Illustration of LLM4Rec autoregressive token generation as a sequential decision process. As tokens are generated, the entropy of the remaining sequence gradually decreases. The information gain (IG) quantifies this reduction, e.g., $\text{IG}(\text{M}; \text{S})$ measures the IG of token "Mario" given prefix "Super". Tokens shared across many items (e.g., "The") exhibit lower decisiveness with lower IG, while more decisive tokens (e.g., "Super") lead to larger IG. Tokens with $\text{IG}$=0—such as "of", "Zelda", "Sword", and "Man"—are referred to as zero-IG tokens.
  • Figure 2: Loss comparison between zero-IG and non-zero-IG tokens in model tuning (epoch 1)
  • Figure 3: Entropy difference in decoding: model prediction vs. ground-truth
  • Figure 4: Relationship between IG values and logits of tokens in decoding. For each dataset, the left subfigure shows that zero-IG tokens are associated with extremely high logits (close to 0). The right subfigure illustrates a negative correlation between IG values and logit magnitudes for non-zero-IG tokens.
  • Figure 5: Loss comparison on CDs and Games datasets: IGD-Tuning effect on zero-IG and non-zero-IG tokens (epoch 1). The results on the other two datasets are in Appendix \ref{['appendix:loss']}.
  • ...and 3 more figures