Table of Contents
Fetching ...

Decoding Matters: Addressing Amplification Bias and Homogeneity Issue for LLM-based Recommendation

Keqin Bao, Jizhi Zhang, Yang Zhang, Xinyue Huo, Chong Chen, Fuli Feng

TL;DR

A new decoding approach named Debiasing-Diversifying Decoding (D^3) is introduced that disables length normalization for ghost tokens to alleviate amplification bias, and incorporates a text-free assistant model to encourage tokens less frequently generated by LLMs for counteracting recommendation homogeneity.

Abstract

Adapting Large Language Models (LLMs) for recommendation requires careful consideration of the decoding process, given the inherent differences between generating items and natural language. Existing approaches often directly apply LLMs' original decoding methods. However, we find these methods encounter significant challenges: 1) amplification bias -- where standard length normalization inflates scores for items containing tokens with generation probabilities close to 1 (termed ghost tokens), and 2) homogeneity issue -- generating multiple similar or repetitive items for a user. To tackle these challenges, we introduce a new decoding approach named Debiasing-Diversifying Decoding (D3). D3 disables length normalization for ghost tokens to alleviate amplification bias, and it incorporates a text-free assistant model to encourage tokens less frequently generated by LLMs for counteracting recommendation homogeneity. Extensive experiments on real-world datasets demonstrate the method's effectiveness in enhancing accuracy and diversity. The code is available at https://github.com/SAI990323/DecodingMatters.

Decoding Matters: Addressing Amplification Bias and Homogeneity Issue for LLM-based Recommendation

TL;DR

A new decoding approach named Debiasing-Diversifying Decoding (D^3) is introduced that disables length normalization for ghost tokens to alleviate amplification bias, and incorporates a text-free assistant model to encourage tokens less frequently generated by LLMs for counteracting recommendation homogeneity.

Abstract

Adapting Large Language Models (LLMs) for recommendation requires careful consideration of the decoding process, given the inherent differences between generating items and natural language. Existing approaches often directly apply LLMs' original decoding methods. However, we find these methods encounter significant challenges: 1) amplification bias -- where standard length normalization inflates scores for items containing tokens with generation probabilities close to 1 (termed ghost tokens), and 2) homogeneity issue -- generating multiple similar or repetitive items for a user. To tackle these challenges, we introduce a new decoding approach named Debiasing-Diversifying Decoding (D3). D3 disables length normalization for ghost tokens to alleviate amplification bias, and it incorporates a text-free assistant model to encourage tokens less frequently generated by LLMs for counteracting recommendation homogeneity. Extensive experiments on real-world datasets demonstrate the method's effectiveness in enhancing accuracy and diversity. The code is available at https://github.com/SAI990323/DecodingMatters.
Paper Structure (30 sections, 5 equations, 6 figures, 6 tables)

This paper contains 30 sections, 5 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Homogeneity comparison of recommendation results between recLLM method BIGRec and traditional method SASRec on three datasets: Instruments (In.), Books (Bo.), and CDs (CD.). (a) and (b) show text similarity and category diversity (measured by entropy) for the first 5 tokens within the top 10 recommendations, where higher similarity and lower entropy indicate greater homogeneity. (c) and (d) display text similarity and category repetition in top-10 recommendations compared to historical interactions.
  • Figure 2: Recommendation diversity (measured by entropy) of the original BIGRec and the variants with other decoding strategies. "+TFA" denotes the variant applying our text-free model assistant decoding, "+Temp" denotes the variant using the widely-used temperature scaling to increase diversity, and "+Temp+TFA" denotes the variant combining "+TFA" and "+Temp". Smaller entropy denotes less diversity.
  • Figure 3: Effectiveness of employing the proposed TFA to enhance recommendation ratio (left) and accuracy (right) for a specified target category of items.
  • Figure 4: The analysis of recommendation results using LLMs on the remaining three datasets (abbreviated as Sp. for Sports, To. for Toys, and Ga. for Games) is presented in these four figures. In Figures (a) and (b), the text-similarity of the top 5 tokens within the top 10 recommendations and the entropy of the overall recommendation categories are illustrated, respectively. Higher text similarity and lower entropy indicate a higher level of homogeneity in recommendations. Figures (c) and (d) depict display text similarity and category repetition in top-10 recommendations versus historical interactions.
  • Figure 5: These Figures showcase the impact of our proposed TFA method on modifying recommendation distributions. In particular, Figure (a) shows the percentage of recommended items for a particular category after adjustments, whereas Figure (b) depicts the performance of recommendation within that category.
  • ...and 1 more figures