Table of Contents
Fetching ...

HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation

Lei Xin, Yuhao Zheng, Ke Cheng, Changjiang Jiang, Zifan Zhang, Fanhu Zeng

TL;DR

HyTRec tackles the challenge of modeling ultra-long user behavior sequences by explicitly decoupling long-term preferences from short-term intents through a hybrid attention framework and a Temporal-Aware Delta Network (TADN). The long-term branch leverages predominantly linear attention with sparsely interleaved softmax layers to retain efficiency, while TADN dynamically upweights recent signals to capture rapid interest drifts. Empirical results on public Amazon datasets and cross-domain data show HyTRec achieves near-linear inference speed and outperforms strong baselines across key metrics, with notable gains on ultra-long sequences. The work advances industrial-scale generative recommendation by providing a practical, scalable architecture that balances accuracy, speed, and robustness to changing user needs.

Abstract

Modeling long sequences of user behaviors has emerged as a critical frontier in generative recommendation. However, existing solutions face a dilemma: linear attention mechanisms achieve efficiency at the cost of retrieval precision due to limited state capacity, while softmax attention suffers from prohibitive computational overhead. To address this challenge, we propose HyTRec, a model featuring a Hybrid Attention architecture that explicitly decouples long-term stable preferences from short-term intent spikes. By assigning massive historical sequences to a linear attention branch and reserving a specialized softmax attention branch for recent interactions, our approach restores precise retrieval capabilities within industrial-scale contexts involving ten thousand interactions. To mitigate the lag in capturing rapid interest drifts within the linear layers, we furthermore design Temporal-Aware Delta Network (TADN) to dynamically upweight fresh behavioral signals while effectively suppressing historical noise. Empirical results on industrial-scale datasets confirm the superiority that our model maintains linear inference speed and outperforms strong baselines, notably delivering over 8% improvement in Hit Rate for users with ultra-long sequences with great efficiency.

HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation

TL;DR

HyTRec tackles the challenge of modeling ultra-long user behavior sequences by explicitly decoupling long-term preferences from short-term intents through a hybrid attention framework and a Temporal-Aware Delta Network (TADN). The long-term branch leverages predominantly linear attention with sparsely interleaved softmax layers to retain efficiency, while TADN dynamically upweights recent signals to capture rapid interest drifts. Empirical results on public Amazon datasets and cross-domain data show HyTRec achieves near-linear inference speed and outperforms strong baselines across key metrics, with notable gains on ultra-long sequences. The work advances industrial-scale generative recommendation by providing a practical, scalable architecture that balances accuracy, speed, and robustness to changing user needs.

Abstract

Modeling long sequences of user behaviors has emerged as a critical frontier in generative recommendation. However, existing solutions face a dilemma: linear attention mechanisms achieve efficiency at the cost of retrieval precision due to limited state capacity, while softmax attention suffers from prohibitive computational overhead. To address this challenge, we propose HyTRec, a model featuring a Hybrid Attention architecture that explicitly decouples long-term stable preferences from short-term intent spikes. By assigning massive historical sequences to a linear attention branch and reserving a specialized softmax attention branch for recent interactions, our approach restores precise retrieval capabilities within industrial-scale contexts involving ten thousand interactions. To mitigate the lag in capturing rapid interest drifts within the linear layers, we furthermore design Temporal-Aware Delta Network (TADN) to dynamically upweight fresh behavioral signals while effectively suppressing historical noise. Empirical results on industrial-scale datasets confirm the superiority that our model maintains linear inference speed and outperforms strong baselines, notably delivering over 8% improvement in Hit Rate for users with ultra-long sequences with great efficiency.
Paper Structure (47 sections, 9 equations, 6 figures, 8 tables)

This paper contains 47 sections, 9 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: The Evolution of Attention Mechanisms.
  • Figure 2: The Framework of HyTRec.
  • Figure 3: We compare the training throughput of models with the same parameter scale on a single V100 GPU under different behavior sequence lengths.
  • Figure 4: Performance Comparison Under Different Hybrid Attention Ratios.
  • Figure 5: Performance Comparison Under Different Number of Experts
  • ...and 1 more figures