HyMoERec: Hybrid Mixture-of-Experts for Sequential Recommendation
Kunrong Li, Zhu Sun, Kwan Hui Lim
TL;DR
The paper addresses the limitation of uniform Position-wise FFNs in sequential recommendation by modeling heterogeneity in user behavior and item complexity. It introduces HyMoERec, a Hybrid Mixture of Experts with a dense shared branch and a sparse set of specialized experts, enabling stable optimization and adaptive specialization. Routing uses a lightweight router that outputs logits in $\\mathbb{R}^E$ and a TopK gate to compute $y_{MoE} = \\sum_{i=1}^{K} g_i f_i(x)$, which is fused with the dense path as $\\mathbf{y} = \\mathbf{y}_{dense} + \\alpha \\\mathbf{y}_{MoE}$ where $\\alpha = \\sigma(\\alpha_{param}) w(t)$ and $w(t)$ is a warm-up schedule. Training includes a load-balance regularization $\\lambda_{lb} \\sum_e \\bar{g}_e \\log \\bar{g}_e$ to prevent expert collapse. Empirical results on MovieLens-1M and Beauty show HyMoERec outperforms baselines such as NARM, GRU4Rec, BERT4Rec, and Mamba4Rec, with improvements in HR and NDCG metrics across both domains, indicating improved robustness and personalization for heterogeneous users and items.
Abstract
We propose HyMoERec, a novel sequential recommendation framework that addresses the limitations of uniform Position-wise Feed-Forward Networks in existing models. Current approaches treat all user interactions and items equally, overlooking the heterogeneity in user behavior patterns and diversity in item complexity. HyMoERec initially introduces a hybrid mixture-of-experts architecture that combines shared and specialized expert branches with an adaptive expert fusion mechanism for the sequential recommendation task. This design captures diverse reasoning for varied users and items while ensuring stable training. Experiments on MovieLens-1M and Beauty datasets demonstrate that HyMoERec consistently outperforms state-of-the-art baselines.
