Table of Contents
Fetching ...

FuXi-$γ$: Efficient Sequential Recommendation with Exponential-Power Temporal Encoder and Diagonal-Sparse Positional Mechanism

Dezhi Yi, Wei Guo, Wenyang Cui, Wenxuan He, Huifeng Guo, Yong Liu, Zhenhua Dong, Ye Lu

TL;DR

FuXi-γ targets the efficiency gap in sequential recommendation for long sequences by introducing a cognitively motivated exponential-power temporal encoder and a diagonal-sparse positional mechanism. The decoder-only Transformer design enables pure matrix-based computation with reduced memory access overhead, achieving state-of-the-art accuracy while delivering substantial training and inference speedups. Empirical results across four real-world datasets show strong performance gains and the pruning mechanism preserves quality while dramatically lowering FLOPs. The work demonstrates practical, scalable improvements for long-sequence recommendations and provides insights into temporal-decay modeling and sparsity-driven attention.

Abstract

Sequential recommendation aims to model users' evolving preferences based on their historical interactions. Recent advances leverage Transformer-based architectures to capture global dependencies, but existing methods often suffer from high computational overhead, primarily due to discontinuous memory access in temporal encoding and dense attention over long sequences. To address these limitations, we propose FuXi-$γ$, a novel sequential recommendation framework that improves both effectiveness and efficiency through principled architectural design. FuXi-$γ$ adopts a decoder-only Transformer structure and introduces two key innovations: (1) An exponential-power temporal encoder that encodes relative temporal intervals using a tunable exponential decay function inspired by the Ebbinghaus forgetting curve. This encoder enables flexible modeling of both short-term and long-term preferences while maintaining high efficiency through continuous memory access and pure matrix operations. (2) A diagonal-sparse positional mechanism that prunes low-contribution attention blocks using a diagonal-sliding strategy guided by the persymmetry of Toeplitz matrix. Extensive experiments on four real-world datasets demonstrate that FuXi-$γ$ achieves state-of-the-art performance in recommendation quality, while accelerating training by up to 4.74$\times$ and inference by up to 6.18$\times$, making it a practical and scalable solution for long-sequence recommendation. Our code is available at https://github.com/Yeedzhi/FuXi-gamma.

FuXi-$γ$: Efficient Sequential Recommendation with Exponential-Power Temporal Encoder and Diagonal-Sparse Positional Mechanism

TL;DR

FuXi-γ targets the efficiency gap in sequential recommendation for long sequences by introducing a cognitively motivated exponential-power temporal encoder and a diagonal-sparse positional mechanism. The decoder-only Transformer design enables pure matrix-based computation with reduced memory access overhead, achieving state-of-the-art accuracy while delivering substantial training and inference speedups. Empirical results across four real-world datasets show strong performance gains and the pruning mechanism preserves quality while dramatically lowering FLOPs. The work demonstrates practical, scalable improvements for long-sequence recommendations and provides insights into temporal-decay modeling and sparsity-driven attention.

Abstract

Sequential recommendation aims to model users' evolving preferences based on their historical interactions. Recent advances leverage Transformer-based architectures to capture global dependencies, but existing methods often suffer from high computational overhead, primarily due to discontinuous memory access in temporal encoding and dense attention over long sequences. To address these limitations, we propose FuXi-, a novel sequential recommendation framework that improves both effectiveness and efficiency through principled architectural design. FuXi- adopts a decoder-only Transformer structure and introduces two key innovations: (1) An exponential-power temporal encoder that encodes relative temporal intervals using a tunable exponential decay function inspired by the Ebbinghaus forgetting curve. This encoder enables flexible modeling of both short-term and long-term preferences while maintaining high efficiency through continuous memory access and pure matrix operations. (2) A diagonal-sparse positional mechanism that prunes low-contribution attention blocks using a diagonal-sliding strategy guided by the persymmetry of Toeplitz matrix. Extensive experiments on four real-world datasets demonstrate that FuXi- achieves state-of-the-art performance in recommendation quality, while accelerating training by up to 4.74 and inference by up to 6.18, making it a practical and scalable solution for long-sequence recommendation. Our code is available at https://github.com/Yeedzhi/FuXi-gamma.

Paper Structure

This paper contains 49 sections, 9 equations, 12 figures, 7 tables, 1 algorithm.

Figures (12)

  • Figure 1: Overall architecture of FuXi-$\gamma$.
  • Figure 2: Illustration of diagonal-sparse positional mechanism. In this example, sequence length $n=8$, stride size $s=2$, and configured pruning ratio $\tau=50\%$. Red Blocks are pruned due to lower importance scores. Only the remaining green blocks participate in positional attention computation.
  • Figure 3: Overall efficiency performance comparison.
  • Figure 4: Efficiency comparison of temporal encoders.
  • Figure 5: Visualization comparison of temporal encoders.
  • ...and 7 more figures