A Novel Mamba-based Sequential Recommendation Method
Jun Yuan
TL;DR
Hydra introduces a scalable sequential recommender by employing a multi-head latent Mamba architecture that splits item representations into multiple latent subspaces and interleaves historical context with item information through low-dimensional Mamba blocks. It supports single-domain and multi-domain settings, with the latter enabling a single LLM to be fine-tuned for cross-domain knowledge transfer, dramatically reducing adaptation costs. Empirically, Hydra achieves state-of-the-art performance with far fewer parameters and faster training than Transformer-based baselines and demonstrates practical benefits in online and multi-domain scenarios. Collectively, the approach offers a practical path toward large-scale, knowledge-rich SR systems suitable for real-world deployment.
Abstract
Sequential recommendation (SR), which encodes user activity to predict the next action, has emerged as a widely adopted strategy in developing commercial personalized recommendation systems. Although Transformer-based models have proven effective for sequential recommendation, the complexity of the self-attention module in Transformers scales quadratically with the sequence length. Controlling model complexity is essential for large-scale recommendation systems, as these systems may need to handle billion-scale vocabularies that evolve continuously, as well as user behavior sequences that can exceed tens of thousands in length. In this paper, we propose a novel multi-head latent Mamba architecture, which employs multiple low-dimensional Mamba layers and fully connected layers coupled with positional encoding to simultaneously capture historical and item information within each latent subspace. Our proposed method not only enables scaling up to large-scale parameters but also extends to multi-domain recommendation by integrating and fine-tuning LLMs. Through extensive experiments on public datasets, we demonstrate how Hydra effectively addresses the effectiveness-efficiency dilemma, outperforming state-of-the-art sequential recommendation baselines with significantly fewer parameters and reduced training time.
