FuXi-$α$: Scaling Recommendation Model with Feature Interaction Enhanced Transformer
Yufei Ye, Wei Guo, Jin Yao Chin, Hao Wang, Hong Zhu, Xi Lin, Yuyang Ye, Yong Liu, Ruiming Tang, Defu Lian, Enhong Chen
TL;DR
FuXi-$\alpha$ introduces Adaptive Multi-channel Self-attention (AMS) and a Multi-stage FFN (MFFN) to decouple temporal, positional, and semantic information while enriching implicit feature interactions for large-scale sequential recommendations. Grounded in scaling-law motivation, the model shows consistent offline gains and competitive online impact in Huawei Music, with ablations confirming the essential roles of AMS and MFFN. The results indicate that the approach adheres to scaling laws, with performance improving as model size increases on both public benchmarks and a large industrial dataset. These findings support the potential of FuXi-$\alpha$ for scalable, real-world recommendation systems and suggest avenues for extending to multi-behavior, multi-modal, and long-sequence settings.
Abstract
Inspired by scaling laws and large language models, research on large-scale recommendation models has gained significant attention. Recent advancements have shown that expanding sequential recommendation models to large-scale recommendation models can be an effective strategy. Current state-of-the-art sequential recommendation models primarily use self-attention mechanisms for explicit feature interactions among items, while implicit interactions are managed through Feed-Forward Networks (FFNs). However, these models often inadequately integrate temporal and positional information, either by adding them to attention weights or by blending them with latent representations, which limits their expressive power. A recent model, HSTU, further reduces the focus on implicit feature interactions, constraining its performance. We propose a new model called FuXi-$α$ to address these issues. This model introduces an Adaptive Multi-channel Self-attention mechanism that distinctly models temporal, positional, and semantic features, along with a Multi-stage FFN to enhance implicit feature interactions. Our offline experiments demonstrate that our model outperforms existing models, with its performance continuously improving as the model size increases. Additionally, we conducted an online A/B test within the Huawei Music app, which showed a $4.76\%$ increase in the average number of songs played per user and a $5.10\%$ increase in the average listening duration per user. Our code has been released at https://github.com/USTC-StarTeam/FuXi-alpha.
