MaTrRec: Uniting Mamba and Transformer for Sequential Recommendation

Shun Zhang; Runsen Zhang; Zhirong Yang

MaTrRec: Uniting Mamba and Transformer for Sequential Recommendation

Shun Zhang, Runsen Zhang, Zhirong Yang

TL;DR

This work tackles the efficiency–accuracy gap in sequential recommendation by noting Transformer self-attention incurs $O(n^2)$ cost for long sequences, while Mamba provides linear-time processing suitable for long histories. MaTrRec fuses two paradigms by inserting Mamba blocks to capture long-range dependencies and a Transformer encoder to exploit global attention for short-range patterns, with an embedding layer, FFN, residuals, and Softmax prediction. The authors demonstrate state-of-the-art results across five public datasets, including substantial improvements on highly sparse data (e.g., up to $33\%$ cold-start gains on the Amazon Musical Instruments domain) and provide thorough ablations to validate each component. The work offers a practical, scalable approach for unified long- and short-sequence recommendation and provides code for reproducibility at the linked GitHub repository.

Abstract

Sequential recommendation systems aim to provide personalized recommendations by analyzing dynamic preferences and dependencies within user behavior sequences. Recently, Transformer models can effectively capture user preferences. However, their quadratic computational complexity limits recommendation performance on long interaction sequence data. Inspired by the State Space Model (SSM)representative model, Mamba, which efficiently captures user preferences in long interaction sequences with linear complexity, we find that Mamba's recommendation effectiveness is limited in short interaction sequences, with failing to recall items of actual interest to users and exacerbating the data sparsity cold start problem. To address this issue, we innovatively propose a new model, MaTrRec, which combines the strengths of Mamba and Transformer. This model fully leverages Mamba's advantages in handling long-term dependencies and Transformer's global attention advantages in short-term dependencies, thereby enhances predictive capabilities on both long and short interaction sequence datasets while balancing model efficiency. Notably, our model significantly improves the data sparsity cold start problem, with an improvement of up to 33% on the highly sparse Amazon Musical Instruments dataset. We conducted extensive experimental evaluations on five widely used public datasets. The experimental results show that our model outperforms the current state-of-the-art sequential recommendation models on all five datasets. The code is available at https://github.com/Unintelligentmumu/MaTrRec.

MaTrRec: Uniting Mamba and Transformer for Sequential Recommendation

TL;DR

This work tackles the efficiency–accuracy gap in sequential recommendation by noting Transformer self-attention incurs

cost for long sequences, while Mamba provides linear-time processing suitable for long histories. MaTrRec fuses two paradigms by inserting Mamba blocks to capture long-range dependencies and a Transformer encoder to exploit global attention for short-range patterns, with an embedding layer, FFN, residuals, and Softmax prediction. The authors demonstrate state-of-the-art results across five public datasets, including substantial improvements on highly sparse data (e.g., up to

cold-start gains on the Amazon Musical Instruments domain) and provide thorough ablations to validate each component. The work offers a practical, scalable approach for unified long- and short-sequence recommendation and provides code for reproducibility at the linked GitHub repository.

Abstract

Paper Structure (22 sections, 7 equations, 3 figures, 5 tables)

This paper contains 22 sections, 7 equations, 3 figures, 5 tables.

Introduction
Preparation Work
Sequential Recommendation
Transformer and Mamba
MaTrRec
Problem Statement
Model Architecture
Embedding Layer
Mamba Layer
Multi-Head Attention layer
Feed-Forward Network
Prediction Layer
Experiments
Datasets
Experimental Details and Evaluation Metrics
...and 7 more sections

Figures (3)

Figure 1: The framework of MaTrRec
Figure 2: The effect of Dropout on a model's Recall@10 and NDCG@10 performance.
Figure 3: The effect of Maximum Sequence Length N on a model's Recall@10 and NDCG@10 performance.

MaTrRec: Uniting Mamba and Transformer for Sequential Recommendation

TL;DR

Abstract

MaTrRec: Uniting Mamba and Transformer for Sequential Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)