FAIR: Focused Attention Is All You Need for Generative Recommendation
Longtao Xiao, Haolin Zhang, Guohao Cai, Jieming Zhu, Yifan Wang, Heng Chang, Zhenhua Dong, Xiu Li, Ruixuan Li
TL;DR
FAIR tackles attention-noise in transformer-based generative recommendations by introducing a focused attention mechanism that subtracts two attention branches, complemented by a noise-robustness objective and mutual information maximization. The model discretizes items into longer code sequences and uses a non-autoregressive multi-token prediction objective, enabling parallel generation and efficient mapping to items. Across four public benchmarks, FAIR consistently outperforms traditional and generative baselines, with ablations confirming the necessity of each component and analyses identifying optimal code length and embedding size. This work offers a practical, robust approach to improve context relevance and predictive accuracy in generative recommendation systems.
Abstract
Recently, transformer-based generative recommendation has garnered significant attention for user behavior modeling. However, it often requires discretizing items into multi-code representations (e.g., typically four code tokens or more), which sharply increases the length of the original item sequence. This expansion poses challenges to transformer-based models for modeling user behavior sequences with inherent noises, since they tend to overallocate attention to irrelevant or noisy context. To mitigate this issue, we propose FAIR, the first generative recommendation framework with focused attention, which enhances attention scores to relevant context while suppressing those to irrelevant ones. Specifically, we propose (1) a focused attention mechanism integrated into the standard Transformer, which learns two separate sets of Q and K attention weights and computes their difference as the final attention scores to eliminate attention noise while focusing on relevant contexts; (2) a noise-robustness objective, which encourages the model to maintain stable attention patterns under stochastic perturbations, preventing undesirable shifts toward irrelevant context due to noise; and (3) a mutual information maximization objective, which guides the model to identify contexts that are most informative for next-item prediction. We validate the effectiveness of FAIR on four public benchmarks, demonstrating its superior performance compared to existing methods.
