Table of Contents
Fetching ...

FAIR: Focused Attention Is All You Need for Generative Recommendation

Longtao Xiao, Haolin Zhang, Guohao Cai, Jieming Zhu, Yifan Wang, Heng Chang, Zhenhua Dong, Xiu Li, Ruixuan Li

TL;DR

FAIR tackles attention-noise in transformer-based generative recommendations by introducing a focused attention mechanism that subtracts two attention branches, complemented by a noise-robustness objective and mutual information maximization. The model discretizes items into longer code sequences and uses a non-autoregressive multi-token prediction objective, enabling parallel generation and efficient mapping to items. Across four public benchmarks, FAIR consistently outperforms traditional and generative baselines, with ablations confirming the necessity of each component and analyses identifying optimal code length and embedding size. This work offers a practical, robust approach to improve context relevance and predictive accuracy in generative recommendation systems.

Abstract

Recently, transformer-based generative recommendation has garnered significant attention for user behavior modeling. However, it often requires discretizing items into multi-code representations (e.g., typically four code tokens or more), which sharply increases the length of the original item sequence. This expansion poses challenges to transformer-based models for modeling user behavior sequences with inherent noises, since they tend to overallocate attention to irrelevant or noisy context. To mitigate this issue, we propose FAIR, the first generative recommendation framework with focused attention, which enhances attention scores to relevant context while suppressing those to irrelevant ones. Specifically, we propose (1) a focused attention mechanism integrated into the standard Transformer, which learns two separate sets of Q and K attention weights and computes their difference as the final attention scores to eliminate attention noise while focusing on relevant contexts; (2) a noise-robustness objective, which encourages the model to maintain stable attention patterns under stochastic perturbations, preventing undesirable shifts toward irrelevant context due to noise; and (3) a mutual information maximization objective, which guides the model to identify contexts that are most informative for next-item prediction. We validate the effectiveness of FAIR on four public benchmarks, demonstrating its superior performance compared to existing methods.

FAIR: Focused Attention Is All You Need for Generative Recommendation

TL;DR

FAIR tackles attention-noise in transformer-based generative recommendations by introducing a focused attention mechanism that subtracts two attention branches, complemented by a noise-robustness objective and mutual information maximization. The model discretizes items into longer code sequences and uses a non-autoregressive multi-token prediction objective, enabling parallel generation and efficient mapping to items. Across four public benchmarks, FAIR consistently outperforms traditional and generative baselines, with ablations confirming the necessity of each component and analyses identifying optimal code length and embedding size. This work offers a practical, robust approach to improve context relevance and predictive accuracy in generative recommendation systems.

Abstract

Recently, transformer-based generative recommendation has garnered significant attention for user behavior modeling. However, it often requires discretizing items into multi-code representations (e.g., typically four code tokens or more), which sharply increases the length of the original item sequence. This expansion poses challenges to transformer-based models for modeling user behavior sequences with inherent noises, since they tend to overallocate attention to irrelevant or noisy context. To mitigate this issue, we propose FAIR, the first generative recommendation framework with focused attention, which enhances attention scores to relevant context while suppressing those to irrelevant ones. Specifically, we propose (1) a focused attention mechanism integrated into the standard Transformer, which learns two separate sets of Q and K attention weights and computes their difference as the final attention scores to eliminate attention noise while focusing on relevant contexts; (2) a noise-robustness objective, which encourages the model to maintain stable attention patterns under stochastic perturbations, preventing undesirable shifts toward irrelevant context due to noise; and (3) a mutual information maximization objective, which guides the model to identify contexts that are most informative for next-item prediction. We validate the effectiveness of FAIR on four public benchmarks, demonstrating its superior performance compared to existing methods.

Paper Structure

This paper contains 31 sections, 18 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Transformer often tends to overallocate attention to irrelevant context (e.g., attention noise). However, FAIR enhance attention to relevant context while suppressing noise.
  • Figure 2: An overview of FAIR. FAIR consists of three major components: Focused Attention Mechanism (FAM), Noise-Robustness Task (NRT), and Mutual Information Maximization Task (MIM).
  • Figure 3: Analysis of the performance impact of the length of code sequences $L$.
  • Figure 4: Analysis of the performance impact of the embedding dimension of model $d$.
  • Figure 5: Sensitivity of the loss coefficient $\alpha$.
  • ...and 4 more figures